[
https://issues.apache.org/jira/browse/SQOOP-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Harsh J updated SQOOP-1293:
---------------------------
Environment: (was: Hortonworks Sandbox 2.0)
> --hive-import causes --target-dir and --warehouse-dir to not be respected,
> nor --delete-target-dir
> --------------------------------------------------------------------------------------------------
>
> Key: SQOOP-1293
> URL: https://issues.apache.org/jira/browse/SQOOP-1293
> Project: Sqoop
> Issue Type: Bug
> Components: connectors/sqlserver, hive-integration
> Affects Versions: 1.4.4
> Reporter: Hari Sekhon
> Priority: Minor
>
> Hi,
> I'm importing a table from SQL Server 2012 and am using --hive-import to
> create the metadata automatically, but am finding that it causes --target-dir
> and --warehouse-dir to not be respected, nor --delete-target-dir.
> sqoop import --connect "jdbc:sqlserver://x.x.x.x:1533;database=MyDatabase"
> --username omitted --password omitted --driver
> com.microsoft.sqlserver.jdbc.SQLServerDriver --table "cube.DimCounterParty"
> --split-by CounterpartyKey --hive-import --target-dir
> /MyDatabase/CounterParty --delete-target-dir
> (fyi I'm using --driver to work around bug SQOOP-1292)
> So I tried --warehouse-dir in case it needed that instead of --target-dir
> sqoop import --connect "jdbc:sqlserver://x.x.x.x:1533;database=MyDatabase"
> --username omitted --password omitted --driver
> com.microsoft.sqlserver.jdbc.SQLServerDriver --table "cube.DimCounterParty"
> --split-by CounterpartyKey --hive-import --warehouse-dir
> /MyDatabase/CounterParty --delete-target-dir
> but in both cases it ingested the data to
> /apps/hive/warehouse/cube.db/dimcounterparty.
> What's also strange is that it created the directory specified for
> --warehouse-dir but then didn't appear to place the data in it.
> I wanted to use --delete-target-dir to replace the whole table each time for
> this test since the source table is only ~650,000 rows and 185MB.
> What I've found is that on top of ingesting in to
> /apps/hive/warehouse/cube.db/dimcounterparty by disregarding
> --delete-target-dir it is causing the table volume to grow cumulatively for
> each run, such that after a few runs the {noformat}select count(*){noformat}
> on the table now shows 5,546,661 rows instead of 650,000.
> Here is the the hive warehouse directory on HDFS where you can see the
> accumulation of the data:
> {noformat}
> hadoop fs -ls /apps/hive/warehouse/cube.db/dimcounterparty/
> Found 40 items
> -rw-r--r-- 3 root hdfs 0 2014-03-07 08:44
> /apps/hive/warehouse/cube.db/dimcounterparty/_SUCCESS
> -rw-r--r-- 3 root hdfs 0 2014-03-07 09:10
> /apps/hive/warehouse/cube.db/dimcounterparty/_SUCCESS_copy_1
> -rw-r--r-- 3 root hdfs 0 2014-03-07 09:33
> /apps/hive/warehouse/cube.db/dimcounterparty/_SUCCESS_copy_2
> -rw-r--r-- 3 root hdfs 0 2014-03-07 09:37
> /apps/hive/warehouse/cube.db/dimcounterparty/_SUCCESS_copy_3
> -rw-r--r-- 3 root hdfs 0 2014-03-07 09:42
> /apps/hive/warehouse/cube.db/dimcounterparty/_SUCCESS_copy_4
> -rw-r--r-- 3 root hdfs 0 2014-03-07 10:04
> /apps/hive/warehouse/cube.db/dimcounterparty/_SUCCESS_copy_5
> -rw-r--r-- 3 root hdfs 0 2014-03-07 10:14
> /apps/hive/warehouse/cube.db/dimcounterparty/_SUCCESS_copy_6
> -rw-r--r-- 3 root hdfs 0 2014-03-07 10:16
> /apps/hive/warehouse/cube.db/dimcounterparty/_SUCCESS_copy_7
> -rw-r--r-- 3 root hdfs 49044407 2014-03-07 08:44
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00000
> -rw-r--r-- 3 root hdfs 49045389 2014-03-07 09:10
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00000_copy_1
> -rw-r--r-- 3 root hdfs 49045944 2014-03-07 09:33
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00000_copy_2
> -rw-r--r-- 3 root hdfs 49045944 2014-03-07 09:37
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00000_copy_3
> -rw-r--r-- 3 root hdfs 49045944 2014-03-07 09:41
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00000_copy_4
> -rw-r--r-- 3 root hdfs 49045944 2014-03-07 10:04
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00000_copy_5
> -rw-r--r-- 3 root hdfs 49045944 2014-03-07 10:14
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00000_copy_6
> -rw-r--r-- 3 root hdfs 49045944 2014-03-07 10:15
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00000_copy_7
> -rw-r--r-- 3 root hdfs 52363518 2014-03-07 08:44
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00001
> -rw-r--r-- 3 root hdfs 52363912 2014-03-07 09:10
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00001_copy_1
> -rw-r--r-- 3 root hdfs 52364256 2014-03-07 09:33
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00001_copy_2
> -rw-r--r-- 3 root hdfs 52364256 2014-03-07 09:37
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00001_copy_3
> -rw-r--r-- 3 root hdfs 52364256 2014-03-07 09:41
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00001_copy_4
> -rw-r--r-- 3 root hdfs 52364256 2014-03-07 10:03
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00001_copy_5
> -rw-r--r-- 3 root hdfs 52364256 2014-03-07 10:14
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00001_copy_6
> -rw-r--r-- 3 root hdfs 52364256 2014-03-07 10:15
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00001_copy_7
> -rw-r--r-- 3 root hdfs 51796051 2014-03-07 08:44
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00002
> -rw-r--r-- 3 root hdfs 51796027 2014-03-07 09:10
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00002_copy_1
> -rw-r--r-- 3 root hdfs 51796623 2014-03-07 09:33
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00002_copy_2
> -rw-r--r-- 3 root hdfs 51796623 2014-03-07 09:37
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00002_copy_3
> -rw-r--r-- 3 root hdfs 51796623 2014-03-07 09:41
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00002_copy_4
> -rw-r--r-- 3 root hdfs 51796623 2014-03-07 10:03
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00002_copy_5
> -rw-r--r-- 3 root hdfs 51796623 2014-03-07 10:14
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00002_copy_6
> -rw-r--r-- 3 root hdfs 51796623 2014-03-07 10:15
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00002_copy_7
> -rw-r--r-- 3 root hdfs 45445570 2014-03-07 08:44
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00003
> -rw-r--r-- 3 root hdfs 45445544 2014-03-07 09:10
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00003_copy_1
> -rw-r--r-- 3 root hdfs 45445719 2014-03-07 09:33
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00003_copy_2
> -rw-r--r-- 3 root hdfs 45445719 2014-03-07 09:37
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00003_copy_3
> -rw-r--r-- 3 root hdfs 45445719 2014-03-07 09:42
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00003_copy_4
> -rw-r--r-- 3 root hdfs 45445719 2014-03-07 10:04
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00003_copy_5
> -rw-r--r-- 3 root hdfs 45445719 2014-03-07 10:14
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00003_copy_6
> -rw-r--r-- 3 root hdfs 45445719 2014-03-07 10:16
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00003_copy_7
> {noformat}
> Is this a bug that it doesn't respect --target-dir or at least
> --warehouse-dir?
> This highlights another issue that this should be more intuitive and/or
> {noformat}sqoop import --help{noformat} should make it easier to see what
> options are (not) compatible, or it should specify in the output at job
> initiation time where switches will be disregarded, such as it does when
> using {noformat}--hive-<option>{noformat} without
> {noformat}--hive-import{noformat}
> In my last place I recall using sqoop create-hive-table to generate the
> metadata after import and then editing the table location metadata. It would
> be a lot better if we could fix the behaviour of --hive-import to not require
> such a multi-step workaround.
> Thanks
> Hari Sekhon
> http://www.linkedin.com/in/harisekhon
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)