Re: Review Request 33104: SQOOP-Hive import with Parquet should append automatically

Jarek Cecho Fri, 01 May 2015 10:04:27 -0700

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33104/#review82254
-----------------------------------------------------------



Just few notes:


src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java
<https://reviews.apache.org/r/33104/#comment132951>

    Honestly this is the first time I'm seeing the "doFailIfHiveTableExists" 
method :) It seems unusued in current Sqoop code base, so I'm wondering whether 
it would be better to not use it here (and perhaps drop it completely in 
different JIRA).



src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java
<https://reviews.apache.org/r/33104/#comment132950>

    Just a note for the log output: I believe that the semantcs of 
--create-hive-table is  - create the table if it doesn't exist and do nothing 
if it does exists.
    
    I'm thinking whether the comment should be just mentioning that this will 
do "append" to existing hive tables and that user might consider 
--hive-overwrite if rewrite is needed? E.g. no mention of the 
--create-hive-table. What do you think?



src/java/org/apache/sqoop/tool/BaseSqoopTool.java
<https://reviews.apache.org/r/33104/#comment132952>

    The --append paramemetr doesn't make really sense with Hive because hive 
import behaves differently then HDFS one, right?
    
    It's quite unfortunate, but it seems better to preserve the check to not 
confuse people even more?


Jarcec

- Jarek Cecho


On April 30, 2015, 5:54 p.m., Qian Xu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/33104/
> -----------------------------------------------------------
> 
> (Updated April 30, 2015, 5:54 p.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-2295
>     https://issues.apache.org/jira/browse/SQOOP-2295
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> -------
> 
> Currently, an existing dataset will throw an exception. This differs from 
> `--as-textfile`. I've checked the user manual, the handling of HDFS and Hive 
> are indeed different. For HDFS, unless `--append` is specified, the job will 
> fail when destination exists already. For Hive, unless `--create-hive-table` 
> is specified, the job will become append mode. The patch has made the 
> handling of `--as-textfile` and `--as-parquetfile` consistent.
> 
> 
> Diffs
> -----
> 
>   src/docs/man/hive-args.txt 7d9e427 
>   src/docs/man/sqoop-create-hive-table.txt 7aebcc1 
>   src/docs/user/create-hive-table.txt 3aa34fd 
>   src/docs/user/hive-args.txt 53de92d 
>   src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java d5bfae2 
>   src/java/org/apache/sqoop/mapreduce/ParquetJob.java df55dbc 
>   src/java/org/apache/sqoop/tool/BaseSqoopTool.java c97bb58 
>   src/test/com/cloudera/sqoop/hive/TestHiveImport.java fa717cb 
>   src/test/com/cloudera/sqoop/testutil/BaseSqoopTestCase.java 7934791 
>   testdata/hive/scripts/normalImportAsParquet.q e434e9b 
> 
> Diff: https://reviews.apache.org/r/33104/diff/
> 
> 
> Testing
> -------
> 
> Manually tested append, new create and overwrite cases.
> 
> 
> Thanks,
> 
> Qian Xu
> 
>

Re: Review Request 33104: SQOOP-Hive import with Parquet should append automatically

Reply via email to