[ 
https://issues.apache.org/jira/browse/SQOOP-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999709#comment-13999709
 ] 

Hari Sekhon edited comment on SQOOP-1293 at 2/17/15 12:17 PM:
--------------------------------------------------------------

Please excuse me if I was unclear - I don't believe this isn't invalid - my 
ex-colleague Jarek Jarcec Cecho at Cloudera who is a Sqoop committer 
coincidentally independently confirmed this was a TODO around the same time 
here:

http://stackoverflow.com/questions/22344781/how-to-import-table-as-external-table-using-sqoop

https://issues.apache.org/jira/browse/SQOOP-816

The Jira SQOOP-816 more simply states the hive import to external table 
requirement but I believe I raise a few more points in this Jira around the 
real world usage pattern.

Obviously multi-step workarounds are possible but inelegant. The HCatalog 
reference doesn't make much sense to me in relation to externally managed full 
table replacements.

Requirement succinctly reiterated:

Import table from database into hive as external table placing data in given 
path in HDFS, deleting the directory if it exists to avoid cumulative data 
build up - ie do a total external table refresh operation from source.

Regards,

Hari


was (Author: harisekhon):
This isn't really invalid, my ex-colleague Jarek Jarcec Cecho at Cloudera who 
is a Sqoop committer coincidentally independently confirmed this was a TODO 
around the same time here:

http://stackoverflow.com/questions/22344781/how-to-import-table-as-external-table-using-sqoop

https://issues.apache.org/jira/browse/SQOOP-816

The Jira SQOOP-816 more simply states the hive import to external table 
requirement but I believe I raise a few more points in this Jira around the 
real world usage pattern.

Obviously multi-step workarounds are possible but inelegant. The HCatalog 
reference doesn't make much sense to me in relation to externally managed full 
table replacements.

Requirement succinctly reiterated:

Import table from database into hive as external table placing data in given 
path in HDFS, deleting the directory if it exists to avoid cumulative data 
build up - ie do a total external table refresh operation from source.

Regards,

Hari

> --hive-import causes --target-dir and --warehouse-dir to not be respected, 
> nor --delete-target-dir
> --------------------------------------------------------------------------------------------------
>
>                 Key: SQOOP-1293
>                 URL: https://issues.apache.org/jira/browse/SQOOP-1293
>             Project: Sqoop
>          Issue Type: Bug
>          Components: connectors/sqlserver, hive-integration
>    Affects Versions: 1.4.4
>            Reporter: Hari Sekhon
>            Priority: Minor
>
> Hi,
> I'm importing a table from SQL Server 2012 and am using --hive-import to 
> create the metadata automatically, but am finding that it causes --target-dir 
> and --warehouse-dir to not be respected, nor --delete-target-dir.
> sqoop import --connect "jdbc:sqlserver://x.x.x.x:1533;database=MyDatabase" 
> --username omitted --password omitted --driver 
> com.microsoft.sqlserver.jdbc.SQLServerDriver --table "cube.DimCounterParty" 
> --split-by CounterpartyKey --hive-import --target-dir 
> /MyDatabase/CounterParty --delete-target-dir
> (fyi I'm using --driver to work around bug SQOOP-1292)
> So I tried --warehouse-dir in case it needed that instead of --target-dir
> sqoop import --connect "jdbc:sqlserver://x.x.x.x:1533;database=MyDatabase" 
> --username omitted --password omitted --driver 
> com.microsoft.sqlserver.jdbc.SQLServerDriver --table "cube.DimCounterParty" 
> --split-by CounterpartyKey --hive-import --warehouse-dir 
> /MyDatabase/CounterParty --delete-target-dir
> but in both cases it ingested the data to 
> /apps/hive/warehouse/cube.db/dimcounterparty.
> What's also strange is that it created the directory specified for 
> --warehouse-dir but then didn't appear to place the data in it.
> I wanted to use --delete-target-dir to replace the whole table each time for 
> this test since the source table is only ~650,000 rows and 185MB.
> What I've found is that on top of ingesting in to 
> /apps/hive/warehouse/cube.db/dimcounterparty by disregarding 
> --delete-target-dir it is causing the table volume to grow cumulatively for 
> each run, such that after a few runs the {noformat}select count(*){noformat} 
> on the table now shows 5,546,661 rows instead of 650,000.
> Here is the the hive warehouse directory on HDFS where you can see the 
> accumulation of the data:
> {noformat}
>  hadoop fs -ls /apps/hive/warehouse/cube.db/dimcounterparty/
> Found 40 items
> -rw-r--r--   3 root hdfs          0 2014-03-07 08:44 
> /apps/hive/warehouse/cube.db/dimcounterparty/_SUCCESS
> -rw-r--r--   3 root hdfs          0 2014-03-07 09:10 
> /apps/hive/warehouse/cube.db/dimcounterparty/_SUCCESS_copy_1
> -rw-r--r--   3 root hdfs          0 2014-03-07 09:33 
> /apps/hive/warehouse/cube.db/dimcounterparty/_SUCCESS_copy_2
> -rw-r--r--   3 root hdfs          0 2014-03-07 09:37 
> /apps/hive/warehouse/cube.db/dimcounterparty/_SUCCESS_copy_3
> -rw-r--r--   3 root hdfs          0 2014-03-07 09:42 
> /apps/hive/warehouse/cube.db/dimcounterparty/_SUCCESS_copy_4
> -rw-r--r--   3 root hdfs          0 2014-03-07 10:04 
> /apps/hive/warehouse/cube.db/dimcounterparty/_SUCCESS_copy_5
> -rw-r--r--   3 root hdfs          0 2014-03-07 10:14 
> /apps/hive/warehouse/cube.db/dimcounterparty/_SUCCESS_copy_6
> -rw-r--r--   3 root hdfs          0 2014-03-07 10:16 
> /apps/hive/warehouse/cube.db/dimcounterparty/_SUCCESS_copy_7
> -rw-r--r--   3 root hdfs   49044407 2014-03-07 08:44 
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00000
> -rw-r--r--   3 root hdfs   49045389 2014-03-07 09:10 
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00000_copy_1
> -rw-r--r--   3 root hdfs   49045944 2014-03-07 09:33 
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00000_copy_2
> -rw-r--r--   3 root hdfs   49045944 2014-03-07 09:37 
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00000_copy_3
> -rw-r--r--   3 root hdfs   49045944 2014-03-07 09:41 
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00000_copy_4
> -rw-r--r--   3 root hdfs   49045944 2014-03-07 10:04 
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00000_copy_5
> -rw-r--r--   3 root hdfs   49045944 2014-03-07 10:14 
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00000_copy_6
> -rw-r--r--   3 root hdfs   49045944 2014-03-07 10:15 
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00000_copy_7
> -rw-r--r--   3 root hdfs   52363518 2014-03-07 08:44 
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00001
> -rw-r--r--   3 root hdfs   52363912 2014-03-07 09:10 
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00001_copy_1
> -rw-r--r--   3 root hdfs   52364256 2014-03-07 09:33 
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00001_copy_2
> -rw-r--r--   3 root hdfs   52364256 2014-03-07 09:37 
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00001_copy_3
> -rw-r--r--   3 root hdfs   52364256 2014-03-07 09:41 
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00001_copy_4
> -rw-r--r--   3 root hdfs   52364256 2014-03-07 10:03 
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00001_copy_5
> -rw-r--r--   3 root hdfs   52364256 2014-03-07 10:14 
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00001_copy_6
> -rw-r--r--   3 root hdfs   52364256 2014-03-07 10:15 
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00001_copy_7
> -rw-r--r--   3 root hdfs   51796051 2014-03-07 08:44 
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00002
> -rw-r--r--   3 root hdfs   51796027 2014-03-07 09:10 
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00002_copy_1
> -rw-r--r--   3 root hdfs   51796623 2014-03-07 09:33 
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00002_copy_2
> -rw-r--r--   3 root hdfs   51796623 2014-03-07 09:37 
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00002_copy_3
> -rw-r--r--   3 root hdfs   51796623 2014-03-07 09:41 
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00002_copy_4
> -rw-r--r--   3 root hdfs   51796623 2014-03-07 10:03 
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00002_copy_5
> -rw-r--r--   3 root hdfs   51796623 2014-03-07 10:14 
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00002_copy_6
> -rw-r--r--   3 root hdfs   51796623 2014-03-07 10:15 
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00002_copy_7
> -rw-r--r--   3 root hdfs   45445570 2014-03-07 08:44 
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00003
> -rw-r--r--   3 root hdfs   45445544 2014-03-07 09:10 
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00003_copy_1
> -rw-r--r--   3 root hdfs   45445719 2014-03-07 09:33 
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00003_copy_2
> -rw-r--r--   3 root hdfs   45445719 2014-03-07 09:37 
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00003_copy_3
> -rw-r--r--   3 root hdfs   45445719 2014-03-07 09:42 
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00003_copy_4
> -rw-r--r--   3 root hdfs   45445719 2014-03-07 10:04 
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00003_copy_5
> -rw-r--r--   3 root hdfs   45445719 2014-03-07 10:14 
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00003_copy_6
> -rw-r--r--   3 root hdfs   45445719 2014-03-07 10:16 
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00003_copy_7
> {noformat}
> Is this a bug that it doesn't respect --target-dir or at least 
> --warehouse-dir?
> This highlights another issue that this should be more intuitive and/or 
> {noformat}sqoop import --help{noformat} should make it easier to see what 
> options are (not) compatible, or it should specify in the output at job 
> initiation time where switches will be disregarded, such as it does when 
> using {noformat}--hive-<option>{noformat} without 
> {noformat}--hive-import{noformat}
> In my last place I recall using sqoop create-hive-table to generate the 
> metadata after import and then editing the table location metadata. It would 
> be a lot better if we could fix the behaviour of --hive-import to not require 
> such a multi-step workaround.
> Thanks
> Hari Sekhon
> http://www.linkedin.com/in/harisekhon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to