----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/10688/#review21420 -----------------------------------------------------------
Hi Venkat, Thank you for incorporating my comments, greatly appreciated. I've took a deep look again and I do have following additional comments: 1) Can we add the HCatalog tests into ThirdPartyTest suite? https://github.com/apache/sqoop/blob/trunk/src/test/com/cloudera/sqoop/ThirdPartyTests.java 2) It seems that using --create-hcatalog-table will create the table and exist Sqoop without doing the import: [root@bousa-hcat ~]# sqoop import --connect jdbc:mysql://mysql.ent.cloudera.com/sqoop --username sqoop --password sqoop --table text --hcatalog-table text --create-hcatalog-table 13/06/04 15:44:39 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 13/06/04 15:44:39 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 13/06/04 15:44:39 INFO tool.CodeGenTool: Beginning code generation 13/06/04 15:44:39 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `text` AS t LIMIT 1 13/06/04 15:44:39 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `text` AS t LIMIT 1 13/06/04 15:44:39 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce 13/06/04 15:44:39 INFO orm.CompilationManager: Found hadoop core jar at: /usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-core.jar Note: /tmp/sqoop-root/compile/f726ee2a04cf955e797a4932d94668f7/text.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. 13/06/04 15:44:42 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/f726ee2a04cf955e797a4932d94668f7/text.jar 13/06/04 15:44:42 WARN manager.MySQLManager: It looks like you are importing from mysql. 13/06/04 15:44:42 WARN manager.MySQLManager: This transfer can be faster! Use the --direct 13/06/04 15:44:42 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path. 13/06/04 15:44:42 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql) 13/06/04 15:44:42 INFO mapreduce.ImportJobBase: Beginning import of text 13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: Configuring HCatalog for import job 13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: Configuring HCatalog specific details for job 13/06/04 15:44:42 WARN hcat.SqoopHCatUtilities: Hive home is not set. job may fail if needed jar files are not found correctly. Please set HIVE_HOME in sqoop-env.sh or provide --hive-home option. Setting HIVE_HOME to /usr/lib/hive 13/06/04 15:44:42 WARN hcat.SqoopHCatUtilities: HCatalog home is not set. job may fail if needed jar files are not found correctly. Please set HCAT_HOME in sqoop-env.sh or provide --hcatalog-home option. Setting HCAT_HOME to /usr/lib/hcatalog 13/06/04 15:44:42 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `text` AS t LIMIT 1 13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: Database column names projected : [id, txt] 13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: Database column name - type map : Names: [id, txt] Types : [4, 12] 13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: Creating HCatalog table default.text for import 13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: HCatalog Create table statement: create table default.text ( id int, txt string) stored as rcfile 13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: Executing HCatalog CLI in-process. Hive history file=/tmp/root/hive_job_log_65f4f145-0b1e-4e09-8e40-b7edcfc15f83_2077084453.txt OK Time taken: 25.121 seconds [root@bousa-hcat ~]# src/docs/user/hcatalog.txt <https://reviews.apache.org/r/10688/#comment44346> Can we add here information what will happen if the table already exists and this parameter is specified? src/docs/user/hcatalog.txt <https://reviews.apache.org/r/10688/#comment44347> This seem unnecessary, can we tweak the bash scripts to do this automatically if the hcat command is present? src/java/org/apache/sqoop/mapreduce/ExportJobBase.java <https://reviews.apache.org/r/10688/#comment44350> Nit: I think that this line can be also refactored to the parent class right? src/java/org/apache/sqoop/mapreduce/ImportJobBase.java <https://reviews.apache.org/r/10688/#comment44349> Nit: I think that this line can be also refactored to the parent class right? src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatExportFormat.java <https://reviews.apache.org/r/10688/#comment44351> This method seems to be required only for the debug message. Is it the only reason or did I miss something? src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java <https://reviews.apache.org/r/10688/#comment44369> Nit: It seems that we are doing the options = opts; every in all cases so maybe it would be worth putting this line before "if" statement? src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java <https://reviews.apache.org/r/10688/#comment44371> Nit: Shouldn't be default Hive home in SqoopOptions.getDefaultHiveHome()? src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java <https://reviews.apache.org/r/10688/#comment44372> Shouldn't be default Hive home in SqoopOptions.getDefaultHcatHome()? src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java <https://reviews.apache.org/r/10688/#comment44397> Both Hive and HBase are idempotent when creating tables, so It might make sense to add "IF NOT EXISTS" in order to remain consistent. src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java <https://reviews.apache.org/r/10688/#comment44395> It seems that at this point we are not reading the hive configuration files but yet executing the in-process Hive CLI that will as a result not pick up the configuration file and will use defaults that is not consistent with the executed mapreduce job that will use the proper configuration files. As a result the table will be created in different metastore then into which we are importing data. src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java <https://reviews.apache.org/r/10688/#comment44378> Shouldn't we use here the SqoopOptions.getDefaultHiveHome()? src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java <https://reviews.apache.org/r/10688/#comment44379> Shouldn't we use here the SqoopOptions.getDefaultHCatHome()? src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java <https://reviews.apache.org/r/10688/#comment44382> Nit: Considering that there might be Hadoop3 in the future, would it be simple to change the condition to (isLocalMode and isHadoop1) instead of enumerating all other possible hadoop versions? src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java <https://reviews.apache.org/r/10688/#comment44386> Nit: Those lines seems to be unused. src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java <https://reviews.apache.org/r/10688/#comment44387> Can we write the file in temporary directory rather than in current working directory? (that might not be writable). src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java <https://reviews.apache.org/r/10688/#comment44388> I would suggest to alter this to single line: LOG.error("Error writing HCatalog load-in script: ", ioe); That will also print the stack trace. src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java <https://reviews.apache.org/r/10688/#comment44389> I would suggest to change this line to : LOG.warn("IOException closing stream to HCatalog script: ", ioe); That will also print out the stack trace. Jarcec - Jarek Cecho On June 3, 2013, 4:16 a.m., Venkat Ranganathan wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/10688/ > ----------------------------------------------------------- > > (Updated June 3, 2013, 4:16 a.m.) > > > Review request for Sqoop and Jarek Cecho. > > > Description > ------- > > This patch implements the new feature of integrating HCatalog and Sqoop. > With this feature, it is possible to import and export data between Sqoop and > HCatalog tables. The document attached to SQOOP-931 JIRA issue discusses > the high level appraches. > > With this integration, more fidelity can be brought to the process of moving > data between enterprise data stores and hadoop ecosystem. > > > Diffs > ----- > > build.xml 636c103 > ivy.xml 1fa4dd1 > ivy/ivysettings.xml c4cc561 > src/docs/user/SqoopUserGuide.txt 01ac1cf > src/docs/user/hcatalog.txt PRE-CREATION > src/java/org/apache/sqoop/SqoopOptions.java f18d43e > src/java/org/apache/sqoop/config/ConfigurationConstants.java 5354063 > src/java/org/apache/sqoop/hive/HiveImport.java 838f083 > src/java/org/apache/sqoop/manager/ConnManager.java a1ac38e > src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java ef1d363 > src/java/org/apache/sqoop/mapreduce/ExportJobBase.java 1065d0b > src/java/org/apache/sqoop/mapreduce/ImportJobBase.java 2465f3f > src/java/org/apache/sqoop/mapreduce/JdbcExportJob.java 20636a0 > src/java/org/apache/sqoop/mapreduce/JobBase.java 0df1156 > src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatExportFormat.java > PRE-CREATION > src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatExportMapper.java > PRE-CREATION > src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatImportMapper.java > PRE-CREATION > src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatInputSplit.java > PRE-CREATION > src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatRecordReader.java > PRE-CREATION > src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java > PRE-CREATION > src/java/org/apache/sqoop/tool/BaseSqoopTool.java 42f521f > src/java/org/apache/sqoop/tool/CodeGenTool.java dd34a97 > src/java/org/apache/sqoop/tool/ExportTool.java 215addd > src/java/org/apache/sqoop/tool/ImportTool.java 2627726 > src/perftest/ExportStressTest.java 0a41408 > src/test/com/cloudera/sqoop/hive/TestHiveImport.java 462ccf1 > src/test/com/cloudera/sqoop/testutil/BaseSqoopTestCase.java cf41b96 > src/test/com/cloudera/sqoop/testutil/ExportJobTestCase.java e13f3df > src/test/org/apache/sqoop/hcat/HCatalogExportTest.java PRE-CREATION > src/test/org/apache/sqoop/hcat/HCatalogImportTest.java PRE-CREATION > src/test/org/apache/sqoop/hcat/HCatalogTestUtils.java PRE-CREATION > src/test/org/apache/sqoop/hcat/TestHCatalogBasic.java PRE-CREATION > testdata/hcatalog/conf/hive-log4j.properties PRE-CREATION > testdata/hcatalog/conf/hive-site.xml PRE-CREATION > testdata/hcatalog/conf/log4j.properties PRE-CREATION > > Diff: https://reviews.apache.org/r/10688/diff/ > > > Testing > ------- > > Two new integration test suites with more than 20 tests in total have been > added to test various aspects of the integration. A unit test to test the > option management is also added. All tests pass > > > Thanks, > > Venkat Ranganathan > >
