[ 
https://issues.apache.org/jira/browse/HADOOP-5887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12711853#action_12711853
 ] 

Aaron Kimball commented on HADOOP-5887:
---------------------------------------

True. Though in Hive, LOAD DATA INPATH is implemented as an in-HDFS move, not a 
copy, so it neither performs (much) faster nor saves space to do this. That 
having been said, a reasonable improvement for the future would be to add a 
flag to suppress the move into the "public" warehouse dir and leave it in the 
user's home directory. It'd be nice if any Hive mavens would comment on which 
use cases they use external vs. internal tables for. As I see it, there's not a 
huge amount of difference.

> Sqoop should create tables in Hive metastore after importing to HDFS
> --------------------------------------------------------------------
>
>                 Key: HADOOP-5887
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5887
>             Project: Hadoop Core
>          Issue Type: New Feature
>            Reporter: Aaron Kimball
>            Assignee: Aaron Kimball
>         Attachments: HADOOP-5887.patch
>
>
> Sqoop (HADOOP-5815) imports tables into HDFS; it is a straightforward 
> enhancement to then generate a Hive DDL statement to recreate the table 
> definition in the Hive metastore and move the imported table into the Hive 
> warehouse directory from its upload target.
> This feature enhancement makes this process automatic. An import is performed 
> with sqoop in the usual way; providing the argument "--hive-import" will 
> cause it to then issue a CREATE TABLE .. LOAD DATA INTO statement to a Hive 
> shell. It generates a script file and then attempts to run 
> "$HIVE_HOME/bin/hive" on it, or failing that, any "hive" on the $PATH; 
> $HIVE_HOME can be overridden with --hive-home. As a result, no direct linking 
> against Hive is necessary.
> The unit tests provided with this enhancement use a mock implementation of 
> 'bin/hive' that compares the script it's fed with one from a directory full 
> of "expected" scripts. The exact script file referenced is controlled via an 
> environment variable. It doesn't actually load into a proper Hive metastore, 
> but manual testing has shown that this process works in practice, so the mock 
> implementation is a reasonable unit testing tool.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to