-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10688/#review21420
-----------------------------------------------------------


Hi Venkat,
Thank you for incorporating my comments, greatly appreciated. I've took a deep 
look again and I do have following additional comments:

1) Can we add the HCatalog tests into ThirdPartyTest suite? 
https://github.com/apache/sqoop/blob/trunk/src/test/com/cloudera/sqoop/ThirdPartyTests.java

2) It seems that using --create-hcatalog-table will create the table and exist 
Sqoop without doing the import:

[root@bousa-hcat ~]# sqoop import --connect 
jdbc:mysql://mysql.ent.cloudera.com/sqoop --username sqoop --password sqoop 
--table text --hcatalog-table text --create-hcatalog-table
13/06/04 15:44:39 WARN tool.BaseSqoopTool: Setting your password on the 
command-line is insecure. Consider using -P instead.
13/06/04 15:44:39 INFO manager.MySQLManager: Preparing to use a MySQL streaming 
resultset.
13/06/04 15:44:39 INFO tool.CodeGenTool: Beginning code generation
13/06/04 15:44:39 INFO manager.SqlManager: Executing SQL statement: SELECT t.* 
FROM `text` AS t LIMIT 1
13/06/04 15:44:39 INFO manager.SqlManager: Executing SQL statement: SELECT t.* 
FROM `text` AS t LIMIT 1
13/06/04 15:44:39 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is 
/usr/lib/hadoop-mapreduce
13/06/04 15:44:39 INFO orm.CompilationManager: Found hadoop core jar at: 
/usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-core.jar
Note: /tmp/sqoop-root/compile/f726ee2a04cf955e797a4932d94668f7/text.java uses 
or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
13/06/04 15:44:42 INFO orm.CompilationManager: Writing jar file: 
/tmp/sqoop-root/compile/f726ee2a04cf955e797a4932d94668f7/text.jar
13/06/04 15:44:42 WARN manager.MySQLManager: It looks like you are importing 
from mysql.
13/06/04 15:44:42 WARN manager.MySQLManager: This transfer can be faster! Use 
the --direct
13/06/04 15:44:42 WARN manager.MySQLManager: option to exercise a 
MySQL-specific fast path.
13/06/04 15:44:42 INFO manager.MySQLManager: Setting zero DATETIME behavior to 
convertToNull (mysql)
13/06/04 15:44:42 INFO mapreduce.ImportJobBase: Beginning import of text
13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: Configuring HCatalog for import 
job
13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: Configuring HCatalog specific 
details for job
13/06/04 15:44:42 WARN hcat.SqoopHCatUtilities: Hive home is not set. job may 
fail if needed jar files are not found correctly.  Please set HIVE_HOME in 
sqoop-env.sh or provide --hive-home option.  Setting HIVE_HOME  to /usr/lib/hive
13/06/04 15:44:42 WARN hcat.SqoopHCatUtilities: HCatalog home is not set. job 
may fail if needed jar files are not found correctly.  Please set HCAT_HOME in 
sqoop-env.sh or provide --hcatalog-home option.   Setting HCAT_HOME to 
/usr/lib/hcatalog
13/06/04 15:44:42 INFO manager.SqlManager: Executing SQL statement: SELECT t.* 
FROM `text` AS t LIMIT 1
13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: Database column names projected 
: [id, txt]
13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: Database column name - type map 
:
        Names: [id, txt]
        Types : [4, 12]
13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: Creating HCatalog table 
default.text for import
13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: HCatalog Create table 
statement: 

create table default.text (
        id int,
        txt string)
stored as rcfile
13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: Executing HCatalog CLI 
in-process.
Hive history 
file=/tmp/root/hive_job_log_65f4f145-0b1e-4e09-8e40-b7edcfc15f83_2077084453.txt
OK
Time taken: 25.121 seconds
[root@bousa-hcat ~]#




src/docs/user/hcatalog.txt
<https://reviews.apache.org/r/10688/#comment44346>

    Can we add here information what will happen if the table already exists 
and this parameter is specified?



src/docs/user/hcatalog.txt
<https://reviews.apache.org/r/10688/#comment44347>

    This seem unnecessary, can we tweak the bash scripts to do this 
automatically if the hcat command is present?



src/java/org/apache/sqoop/mapreduce/ExportJobBase.java
<https://reviews.apache.org/r/10688/#comment44350>

    Nit: I think that this line can be also refactored to the parent class 
right?



src/java/org/apache/sqoop/mapreduce/ImportJobBase.java
<https://reviews.apache.org/r/10688/#comment44349>

    Nit: I think that this line can be also refactored to the parent class 
right?



src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatExportFormat.java
<https://reviews.apache.org/r/10688/#comment44351>

    This method seems to be required only for the debug message. Is it the only 
reason or did I miss something?



src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java
<https://reviews.apache.org/r/10688/#comment44369>

    Nit: It seems that we are doing the options = opts; every in all cases so 
maybe it would be worth putting this line before "if" statement?



src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java
<https://reviews.apache.org/r/10688/#comment44371>

    Nit: Shouldn't be default Hive home in SqoopOptions.getDefaultHiveHome()?



src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java
<https://reviews.apache.org/r/10688/#comment44372>

    Shouldn't be default Hive home in SqoopOptions.getDefaultHcatHome()?



src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java
<https://reviews.apache.org/r/10688/#comment44397>

    Both Hive and HBase are idempotent when creating tables, so It might make 
sense to add "IF NOT EXISTS" in order to remain consistent.



src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java
<https://reviews.apache.org/r/10688/#comment44395>

    It seems that at this point we are not reading the hive configuration files 
but yet executing the in-process Hive CLI that will as a result not pick up the 
configuration file and will use defaults that is not consistent with the 
executed mapreduce job that will use the proper configuration files. As a 
result the table will be created in different metastore then into which we are 
importing data.



src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java
<https://reviews.apache.org/r/10688/#comment44378>

    Shouldn't we use here the SqoopOptions.getDefaultHiveHome()?



src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java
<https://reviews.apache.org/r/10688/#comment44379>

    Shouldn't we use here the SqoopOptions.getDefaultHCatHome()?



src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java
<https://reviews.apache.org/r/10688/#comment44382>

    Nit: Considering that there might be Hadoop3 in the future, would it be 
simple to change the condition to (isLocalMode and isHadoop1) instead of 
enumerating all other possible hadoop versions?



src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java
<https://reviews.apache.org/r/10688/#comment44386>

    Nit: Those lines seems to be unused.



src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java
<https://reviews.apache.org/r/10688/#comment44387>

    Can we write the file in temporary directory rather than in current working 
directory? (that might not be writable).



src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java
<https://reviews.apache.org/r/10688/#comment44388>

    I would suggest to alter this to single line:
    
    LOG.error("Error writing HCatalog load-in script: ", ioe);
    
    That will also print the stack trace.



src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java
<https://reviews.apache.org/r/10688/#comment44389>

    I would suggest to change this line to :
    
    LOG.warn("IOException closing stream to HCatalog script: ", ioe);
    
    That will also print out the stack trace.


Jarcec

- Jarek Cecho


On June 3, 2013, 4:16 a.m., Venkat Ranganathan wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/10688/
> -----------------------------------------------------------
> 
> (Updated June 3, 2013, 4:16 a.m.)
> 
> 
> Review request for Sqoop and Jarek Cecho.
> 
> 
> Description
> -------
> 
> This patch implements the new feature of integrating HCatalog and Sqoop.   
> With this feature, it is possible to import and export data between Sqoop and 
> HCatalog tables.   The document attached to SQOOP-931 JIRA issue discusses 
> the high level appraches.  
> 
> With this integration, more fidelity can be brought to the process of moving 
> data between enterprise data stores and hadoop ecosystem.
> 
> 
> Diffs
> -----
> 
>   build.xml 636c103 
>   ivy.xml 1fa4dd1 
>   ivy/ivysettings.xml c4cc561 
>   src/docs/user/SqoopUserGuide.txt 01ac1cf 
>   src/docs/user/hcatalog.txt PRE-CREATION 
>   src/java/org/apache/sqoop/SqoopOptions.java f18d43e 
>   src/java/org/apache/sqoop/config/ConfigurationConstants.java 5354063 
>   src/java/org/apache/sqoop/hive/HiveImport.java 838f083 
>   src/java/org/apache/sqoop/manager/ConnManager.java a1ac38e 
>   src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java ef1d363 
>   src/java/org/apache/sqoop/mapreduce/ExportJobBase.java 1065d0b 
>   src/java/org/apache/sqoop/mapreduce/ImportJobBase.java 2465f3f 
>   src/java/org/apache/sqoop/mapreduce/JdbcExportJob.java 20636a0 
>   src/java/org/apache/sqoop/mapreduce/JobBase.java 0df1156 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatExportFormat.java 
> PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatExportMapper.java 
> PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatImportMapper.java 
> PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatInputSplit.java 
> PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatRecordReader.java 
> PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java 
> PRE-CREATION 
>   src/java/org/apache/sqoop/tool/BaseSqoopTool.java 42f521f 
>   src/java/org/apache/sqoop/tool/CodeGenTool.java dd34a97 
>   src/java/org/apache/sqoop/tool/ExportTool.java 215addd 
>   src/java/org/apache/sqoop/tool/ImportTool.java 2627726 
>   src/perftest/ExportStressTest.java 0a41408 
>   src/test/com/cloudera/sqoop/hive/TestHiveImport.java 462ccf1 
>   src/test/com/cloudera/sqoop/testutil/BaseSqoopTestCase.java cf41b96 
>   src/test/com/cloudera/sqoop/testutil/ExportJobTestCase.java e13f3df 
>   src/test/org/apache/sqoop/hcat/HCatalogExportTest.java PRE-CREATION 
>   src/test/org/apache/sqoop/hcat/HCatalogImportTest.java PRE-CREATION 
>   src/test/org/apache/sqoop/hcat/HCatalogTestUtils.java PRE-CREATION 
>   src/test/org/apache/sqoop/hcat/TestHCatalogBasic.java PRE-CREATION 
>   testdata/hcatalog/conf/hive-log4j.properties PRE-CREATION 
>   testdata/hcatalog/conf/hive-site.xml PRE-CREATION 
>   testdata/hcatalog/conf/log4j.properties PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/10688/diff/
> 
> 
> Testing
> -------
> 
> Two new integration test suites with more than 20 tests in total have been 
> added to test various aspects of the integration.  A unit test to test the 
> option management is also added.   All tests pass
> 
> 
> Thanks,
> 
> Venkat Ranganathan
> 
>

Reply via email to