> On June 4, 2013, 11:15 p.m., Jarek Cecho wrote: > > Hi Venkat, > > Thank you for incorporating my comments, greatly appreciated. I've took a > > deep look again and I do have following additional comments: > > > > 1) Can we add the HCatalog tests into ThirdPartyTest suite? > > https://github.com/apache/sqoop/blob/trunk/src/test/com/cloudera/sqoop/ThirdPartyTests.java > > > > 2) It seems that using --create-hcatalog-table will create the table and > > exist Sqoop without doing the import: > > > > [root@bousa-hcat ~]# sqoop import --connect > > jdbc:mysql://mysql.ent.cloudera.com/sqoop --username sqoop --password sqoop > > --table text --hcatalog-table text --create-hcatalog-table > > 13/06/04 15:44:39 WARN tool.BaseSqoopTool: Setting your password on the > > command-line is insecure. Consider using -P instead. > > 13/06/04 15:44:39 INFO manager.MySQLManager: Preparing to use a MySQL > > streaming resultset. > > 13/06/04 15:44:39 INFO tool.CodeGenTool: Beginning code generation > > 13/06/04 15:44:39 INFO manager.SqlManager: Executing SQL statement: SELECT > > t.* FROM `text` AS t LIMIT 1 > > 13/06/04 15:44:39 INFO manager.SqlManager: Executing SQL statement: SELECT > > t.* FROM `text` AS t LIMIT 1 > > 13/06/04 15:44:39 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is > > /usr/lib/hadoop-mapreduce > > 13/06/04 15:44:39 INFO orm.CompilationManager: Found hadoop core jar at: > > /usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-core.jar > > Note: /tmp/sqoop-root/compile/f726ee2a04cf955e797a4932d94668f7/text.java > > uses or overrides a deprecated API. > > Note: Recompile with -Xlint:deprecation for details. > > 13/06/04 15:44:42 INFO orm.CompilationManager: Writing jar file: > > /tmp/sqoop-root/compile/f726ee2a04cf955e797a4932d94668f7/text.jar > > 13/06/04 15:44:42 WARN manager.MySQLManager: It looks like you are > > importing from mysql. > > 13/06/04 15:44:42 WARN manager.MySQLManager: This transfer can be faster! > > Use the --direct > > 13/06/04 15:44:42 WARN manager.MySQLManager: option to exercise a > > MySQL-specific fast path. > > 13/06/04 15:44:42 INFO manager.MySQLManager: Setting zero DATETIME behavior > > to convertToNull (mysql) > > 13/06/04 15:44:42 INFO mapreduce.ImportJobBase: Beginning import of text > > 13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: Configuring HCatalog for > > import job > > 13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: Configuring HCatalog > > specific details for job > > 13/06/04 15:44:42 WARN hcat.SqoopHCatUtilities: Hive home is not set. job > > may fail if needed jar files are not found correctly. Please set HIVE_HOME > > in sqoop-env.sh or provide --hive-home option. Setting HIVE_HOME to > > /usr/lib/hive > > 13/06/04 15:44:42 WARN hcat.SqoopHCatUtilities: HCatalog home is not set. > > job may fail if needed jar files are not found correctly. Please set > > HCAT_HOME in sqoop-env.sh or provide --hcatalog-home option. Setting > > HCAT_HOME to /usr/lib/hcatalog > > 13/06/04 15:44:42 INFO manager.SqlManager: Executing SQL statement: SELECT > > t.* FROM `text` AS t LIMIT 1 > > 13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: Database column names > > projected : [id, txt] > > 13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: Database column name - type > > map : > > Names: [id, txt] > > Types : [4, 12] > > 13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: Creating HCatalog table > > default.text for import > > 13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: HCatalog Create table > > statement: > > > > create table default.text ( > > id int, > > txt string) > > stored as rcfile > > 13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: Executing HCatalog CLI > > in-process. > > Hive history > > file=/tmp/root/hive_job_log_65f4f145-0b1e-4e09-8e40-b7edcfc15f83_2077084453.txt > > OK > > Time taken: 25.121 seconds > > [root@bousa-hcat ~]# > > > >
Sure, I can add it to that. --create-hcatalog-table - It seems to work by chance - That is, after creating the table a bunch of stuff is done that is not needed. I will add additional checks there > On June 4, 2013, 11:15 p.m., Jarek Cecho wrote: > > src/docs/user/hcatalog.txt, lines 284-285 > > <https://reviews.apache.org/r/10688/diff/9/?file=299864#file299864line284> > > > > This seem unnecessary, can we tweak the bash scripts to do this > > automatically if the hcat command is present? Good point. Since I modified the hive unit tests to function correctly in the presence of real hive environment, this can be easily done. > On June 4, 2013, 11:15 p.m., Jarek Cecho wrote: > > src/java/org/apache/sqoop/mapreduce/ExportJobBase.java, line 95 > > <https://reviews.apache.org/r/10688/diff/9/?file=299870#file299870line95> > > > > Nit: I think that this line can be also refactored to the parent class > > right? Yes. One thing to note is that by moving the isHCatJob to the parent class we lost the ability to mark it as final. Let me rework it > On June 4, 2013, 11:15 p.m., Jarek Cecho wrote: > > src/java/org/apache/sqoop/mapreduce/ImportJobBase.java, line 85 > > <https://reviews.apache.org/r/10688/diff/9/?file=299871#file299871line85> > > > > Nit: I think that this line can be also refactored to the parent class > > right? Please see above > On June 4, 2013, 11:15 p.m., Jarek Cecho wrote: > > src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatExportFormat.java, lines > > 131-137 > > <https://reviews.apache.org/r/10688/diff/9/?file=299874#file299874line131> > > > > This method seems to be required only for the debug message. Is it the > > only reason or did I miss something? Yes, it is needed for debugging purpose when we want to know when the sub record reader or main record reader are called > On June 4, 2013, 11:15 p.m., Jarek Cecho wrote: > > src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java, lines > > 237-241 > > <https://reviews.apache.org/r/10688/diff/9/?file=299879#file299879line237> > > > > Nit: It seems that we are doing the options = opts; every in all cases > > so maybe it would be worth putting this line before "if" statement? Sure > On June 4, 2013, 11:15 p.m., Jarek Cecho wrote: > > src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java, line 249 > > <https://reviews.apache.org/r/10688/diff/9/?file=299879#file299879line249> > > > > Nit: Shouldn't be default Hive home in > > SqoopOptions.getDefaultHiveHome()? Yes. The message needs fixing > On June 4, 2013, 11:15 p.m., Jarek Cecho wrote: > > src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java, line 257 > > <https://reviews.apache.org/r/10688/diff/9/?file=299879#file299879line257> > > > > Shouldn't be default Hive home in SqoopOptions.getDefaultHcatHome()? Yes. As above > On June 4, 2013, 11:15 p.m., Jarek Cecho wrote: > > src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java, line 491 > > <https://reviews.apache.org/r/10688/diff/9/?file=299879#file299879line491> > > > > Both Hive and HBase are idempotent when creating tables, so It might > > make sense to add "IF NOT EXISTS" in order to remain consistent. Good point. I think we will otherwise earlier, but for consistency I think we should do this. Will change > On June 4, 2013, 11:15 p.m., Jarek Cecho wrote: > > src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java, line 523 > > <https://reviews.apache.org/r/10688/diff/9/?file=299879#file299879line523> > > > > It seems that at this point we are not reading the hive configuration > > files but yet executing the in-process Hive CLI that will as a result not > > pick up the configuration file and will use defaults that is not consistent > > with the executed mapreduce job that will use the proper configuration > > files. As a result the table will be created in different metastore then > > into which we are importing data. Hive and hcat configuration files and jars have to be in the classpath brought in by hcat -classpath. Do you think that is not always in the configuration? When I update the configure sqoop script, I will make sure the hive conf is added. > On June 4, 2013, 11:15 p.m., Jarek Cecho wrote: > > src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java, lines > > 749-750 > > <https://reviews.apache.org/r/10688/diff/9/?file=299879#file299879line749> > > > > Shouldn't we use here the SqoopOptions.getDefaultHiveHome()? Yes. WIll fix > On June 4, 2013, 11:15 p.m., Jarek Cecho wrote: > > src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java, lines > > 871-875 > > <https://reviews.apache.org/r/10688/diff/9/?file=299879#file299879line871> > > > > Nit: Those lines seems to be unused. Good catch - earlier I had the ability to execute a command line but removed it in favor of a simpler model. Will remove it > On June 4, 2013, 11:15 p.m., Jarek Cecho wrote: > > src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java, line 876 > > <https://reviews.apache.org/r/10688/diff/9/?file=299879#file299879line876> > > > > Can we write the file in temporary directory rather than in current > > working directory? (that might not be writable). Sure will change > On June 4, 2013, 11:15 p.m., Jarek Cecho wrote: > > src/docs/user/hcatalog.txt, line 160 > > <https://reviews.apache.org/r/10688/diff/9/?file=299864#file299864line160> > > > > Can we add here information what will happen if the table already > > exists and this parameter is specified? Sure. Will do. > On June 4, 2013, 11:15 p.m., Jarek Cecho wrote: > > src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java, lines > > 898-899 > > <https://reviews.apache.org/r/10688/diff/9/?file=299879#file299879line898> > > > > I would suggest to alter this to single line: > > > > LOG.error("Error writing HCatalog load-in script: ", ioe); > > > > That will also print the stack trace. Sure will do > On June 4, 2013, 11:15 p.m., Jarek Cecho wrote: > > src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java, lines > > 906-907 > > <https://reviews.apache.org/r/10688/diff/9/?file=299879#file299879line906> > > > > I would suggest to change this line to : > > > > LOG.warn("IOException closing stream to HCatalog script: ", ioe); > > > > That will also print out the stack trace. Sure will do - Venkat ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/10688/#review21420 ----------------------------------------------------------- On June 3, 2013, 4:16 a.m., Venkat Ranganathan wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/10688/ > ----------------------------------------------------------- > > (Updated June 3, 2013, 4:16 a.m.) > > > Review request for Sqoop and Jarek Cecho. > > > Description > ------- > > This patch implements the new feature of integrating HCatalog and Sqoop. > With this feature, it is possible to import and export data between Sqoop and > HCatalog tables. The document attached to SQOOP-931 JIRA issue discusses > the high level appraches. > > With this integration, more fidelity can be brought to the process of moving > data between enterprise data stores and hadoop ecosystem. > > > Diffs > ----- > > build.xml 636c103 > ivy.xml 1fa4dd1 > ivy/ivysettings.xml c4cc561 > src/docs/user/SqoopUserGuide.txt 01ac1cf > src/docs/user/hcatalog.txt PRE-CREATION > src/java/org/apache/sqoop/SqoopOptions.java f18d43e > src/java/org/apache/sqoop/config/ConfigurationConstants.java 5354063 > src/java/org/apache/sqoop/hive/HiveImport.java 838f083 > src/java/org/apache/sqoop/manager/ConnManager.java a1ac38e > src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java ef1d363 > src/java/org/apache/sqoop/mapreduce/ExportJobBase.java 1065d0b > src/java/org/apache/sqoop/mapreduce/ImportJobBase.java 2465f3f > src/java/org/apache/sqoop/mapreduce/JdbcExportJob.java 20636a0 > src/java/org/apache/sqoop/mapreduce/JobBase.java 0df1156 > src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatExportFormat.java > PRE-CREATION > src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatExportMapper.java > PRE-CREATION > src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatImportMapper.java > PRE-CREATION > src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatInputSplit.java > PRE-CREATION > src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatRecordReader.java > PRE-CREATION > src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java > PRE-CREATION > src/java/org/apache/sqoop/tool/BaseSqoopTool.java 42f521f > src/java/org/apache/sqoop/tool/CodeGenTool.java dd34a97 > src/java/org/apache/sqoop/tool/ExportTool.java 215addd > src/java/org/apache/sqoop/tool/ImportTool.java 2627726 > src/perftest/ExportStressTest.java 0a41408 > src/test/com/cloudera/sqoop/hive/TestHiveImport.java 462ccf1 > src/test/com/cloudera/sqoop/testutil/BaseSqoopTestCase.java cf41b96 > src/test/com/cloudera/sqoop/testutil/ExportJobTestCase.java e13f3df > src/test/org/apache/sqoop/hcat/HCatalogExportTest.java PRE-CREATION > src/test/org/apache/sqoop/hcat/HCatalogImportTest.java PRE-CREATION > src/test/org/apache/sqoop/hcat/HCatalogTestUtils.java PRE-CREATION > src/test/org/apache/sqoop/hcat/TestHCatalogBasic.java PRE-CREATION > testdata/hcatalog/conf/hive-log4j.properties PRE-CREATION > testdata/hcatalog/conf/hive-site.xml PRE-CREATION > testdata/hcatalog/conf/log4j.properties PRE-CREATION > > Diff: https://reviews.apache.org/r/10688/diff/ > > > Testing > ------- > > Two new integration test suites with more than 20 tests in total have been > added to test various aspects of the integration. A unit test to test the > option management is also added. All tests pass > > > Thanks, > > Venkat Ranganathan > >
