----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/12936/#review25304 -----------------------------------------------------------
Hi Hari, thank you again for working on this JIRA and please accept my apologies for the late response. I've deep dived into the patch and I do have couple of comments and questions. common/src/main/java/org/apache/sqoop/etl/io/DataReader.java <https://reviews.apache.org/r/12936/#comment49646> Extreme nit: Read data from the execution framework... (not necessary MR) common/src/main/java/org/apache/sqoop/etl/io/DataReader.java <https://reviews.apache.org/r/12936/#comment49647> Extreme nit: Read data from execution framework... common/src/main/java/org/apache/sqoop/etl/io/DataReader.java <https://reviews.apache.org/r/12936/#comment49648> Nit: Read data from execution framework as a native format. This should be independent native format right? common/src/main/java/org/apache/sqoop/etl/io/DataWriter.java <https://reviews.apache.org/r/12936/#comment49649> Extreme nit: Write an array of objects into the execution framework... common/src/main/java/org/apache/sqoop/etl/io/DataWriter.java <https://reviews.apache.org/r/12936/#comment49650> Extreme nit: Write data into execution framework.... common/src/main/java/org/apache/sqoop/schema/type/Column.java <https://reviews.apache.org/r/12936/#comment49651> Can we add descriptive java doc describing what exactly are expecting to validate? connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcExportInitializer.java <https://reviews.apache.org/r/12936/#comment50066> Those imports seems to be unused. connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcExportInitializer.java <https://reviews.apache.org/r/12936/#comment50067> We should also take into account exportJobConfiguration.table.schema when building the query. connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcImportInitializer.java <https://reviews.apache.org/r/12936/#comment50068> Let's clean up unused imports after this refactoring. connector/connector-generic-jdbc/src/test/java/org/apache/sqoop/connector/jdbc/TestExportLoader.java <https://reviews.apache.org/r/12936/#comment49652> This method seems to be generated automatically, but it seems to be removing the fail() call, is it intentional? connector/connector-sdk/src/main/java/org/apache/sqoop/connector/CSVIntermediateDataFormat.java <https://reviews.apache.org/r/12936/#comment49656> The wiki [1] is also saying that the 0x5C should be substituted. 1: https://cwiki.apache.org/confluence/display/SQOOP/Sqoop2+Intermediate+representation connector/connector-sdk/src/main/java/org/apache/sqoop/connector/CSVIntermediateDataFormat.java <https://reviews.apache.org/r/12936/#comment49658> Similarly as above, it seems that we are missing the \\ replacement for 0x5C. connector/connector-sdk/src/main/java/org/apache/sqoop/connector/CSVIntermediateDataFormat.java <https://reviews.apache.org/r/12936/#comment50050> Is it wise to consume the exception here without any counters or other reporting except of log message? It also seems that other parts of the code are not null safe, so this error handling will most likely just cause NPE somewhere else. connector/connector-sdk/src/main/java/org/apache/sqoop/connector/CSVIntermediateDataFormat.java <https://reviews.apache.org/r/12936/#comment49660> The setData() method is conditionally running the validations, are they intentionally skipped here in setTextData()? connector/connector-sdk/src/main/java/org/apache/sqoop/connector/CSVIntermediateDataFormat.java <https://reviews.apache.org/r/12936/#comment49661> I'm afraid that String.split() won't work correctly for following example input: 1,"Hi, here is Jarcec" That contains two columns, but three columns will be created when splitting by comma. connector/connector-sdk/src/main/java/org/apache/sqoop/connector/CSVIntermediateDataFormat.java <https://reviews.apache.org/r/12936/#comment50051> Nit: Please pass the correct array size instead of the zero. connector/connector-sdk/src/main/java/org/apache/sqoop/connector/CSVIntermediateDataFormat.java <https://reviews.apache.org/r/12936/#comment50052> I would suggest to be strict here and accept only upper case variant of NULL. connector/connector-sdk/src/main/java/org/apache/sqoop/connector/CSVIntermediateDataFormat.java <https://reviews.apache.org/r/12936/#comment50053> I would suggest to die quickly for unsupported data types rather than just silently pass them as a un-escaped strings. connector/connector-sdk/src/main/java/org/apache/sqoop/connector/CSVIntermediateDataFormat.java <https://reviews.apache.org/r/12936/#comment50055> Is there a reason why to do our own join rather than StringUtils.join()? connector/connector-sdk/src/main/java/org/apache/sqoop/connector/CSVIntermediateDataFormat.java <https://reviews.apache.org/r/12936/#comment50056> Is expected and desirable that user can't reset schema to NULL? connector/connector-sdk/src/main/java/org/apache/sqoop/connector/CSVIntermediateDataFormat.java <https://reviews.apache.org/r/12936/#comment50035> Nit: Would it make more sense to have IntermediateDataFormat (without type here) or CSVIntermediateDataFormat since we are not allowing nothing else? connector/connector-sdk/src/main/java/org/apache/sqoop/connector/CSVIntermediateDataFormat.java <https://reviews.apache.org/r/12936/#comment50036> This will convert the binary string to an array of 0 and 1, right? The intermediate data format is suggesting the MySQLdump approach that is serializing the bytes as they are though. The 0 and 1 seems to be quite inefficient. connector/connector-sdk/src/main/java/org/apache/sqoop/connector/CSVIntermediateDataFormat.java <https://reviews.apache.org/r/12936/#comment50037> Since the regular expressions are pre-build and therefore the PatternSyntaxException should not be thrown, is there any other exception that we are expecting? connector/connector-sdk/src/main/java/org/apache/sqoop/connector/CSVIntermediateDataFormat.java <https://reviews.apache.org/r/12936/#comment50060> The conditional escaped argument seems quite dangerous to me - what if the user has saved the data on disk with escaped = true and is using this class to read the data from disk? The default value is false, so this will remove some actual data, right? connector/connector-sdk/src/main/java/org/apache/sqoop/connector/CSVIntermediateDataFormat.java <https://reviews.apache.org/r/12936/#comment50038> Since the regular expressions are pre-build and therefore the PatternSyntaxException should not be thrown, is there any other exception that we are expecting? connector/connector-sdk/src/main/java/org/apache/sqoop/connector/IntermediateDataFormat.java <https://reviews.apache.org/r/12936/#comment50058> Since couple of the methods, such as setSchema(), getSchema(), validateData() will have to be implemented by every IDF in a very similar way, I'm wondering if it would make sense to convert the IDF to abstract class and provide the base implementation? connector/connector-sdk/src/main/java/org/apache/sqoop/connector/IntermediateDataFormat.java <https://reviews.apache.org/r/12936/#comment50039> Can we javadoc for those two methods? Also it seems that they are not used, so I'm wondering if we are planning to use them in the future or if they are artifact from previous development? core/src/main/java/org/apache/sqoop/framework/SubmissionRequest.java <https://reviews.apache.org/r/12936/#comment50040> Can we make this a Class instead of String? execution/mapreduce/src/main/java/org/apache/sqoop/execution/mapreduce/MapreduceExecutionEngine.java <https://reviews.apache.org/r/12936/#comment50041> Nit: Can we keep the style from previous lines and put this on single line? execution/mapreduce/src/main/java/org/apache/sqoop/job/io/SqoopWritable.java <https://reviews.apache.org/r/12936/#comment50063> The length variable seems to be used only readFields method, so I'm wondering if local variable wouldn't be better fit? execution/mapreduce/src/main/java/org/apache/sqoop/job/io/SqoopWritable.java <https://reviews.apache.org/r/12936/#comment50064> Casting the data to UTF might be quite dangerous, especially for binary values as they will be interpreted and possibly corrupted. This won't be an issue with current implementation that stores binary data as a stream of 0s and 1s, but will become an issue when (if) we switch to the format designed on the wiki. execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopFileOutputFormat.java <https://reviews.apache.org/r/12936/#comment50070> This import do not seem to be relevant any more. execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopMapper.java <https://reviews.apache.org/r/12936/#comment50042> Nit: Please keep the line on a single line. execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopMapper.java <https://reviews.apache.org/r/12936/#comment50043> Nit: This change do not seem necessary. execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopNullOutputFormat.java <https://reviews.apache.org/r/12936/#comment50071> This import do not seem to be relevant any more. execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopOutputFormatLoadExecutor.java <https://reviews.apache.org/r/12936/#comment50044> Nit: Can we please put this on single line? execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopReducer.java <https://reviews.apache.org/r/12936/#comment50072> This import do not seem to be relevant any more. execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopReducer.java <https://reviews.apache.org/r/12936/#comment50045> Nit: Please put the line on a single line. execution/mapreduce/src/test/java/org/apache/sqoop/job/JobUtils.java <https://reviews.apache.org/r/12936/#comment50069> This import do not seem relevant any more. execution/mapreduce/src/test/java/org/apache/sqoop/job/JobUtils.java <https://reviews.apache.org/r/12936/#comment50046> Nit: Please put this on a single line. execution/mapreduce/src/test/java/org/apache/sqoop/job/TestHdfsExtract.java <https://reviews.apache.org/r/12936/#comment50047> It seems that we are building the same schema in multiple test cases. Would it make sense to create a helper method for this rather than copy&pasting the code? execution/mapreduce/src/test/java/org/apache/sqoop/job/io/SqoopWritableTest.java <https://reviews.apache.org/r/12936/#comment50065> Hari got 20 experience points for using awesome text messages in the tests! spi/pom.xml <https://reviews.apache.org/r/12936/#comment50048> I'm not sure whether we want to make all connectors depending on the SDK. The SDK should be light library of code for connectors, We should not force the connector to reuse anything. Is the dependency required? spi/src/main/java/org/apache/sqoop/connector/spi/SqoopConnector.java <https://reviews.apache.org/r/12936/#comment50049> Nit: Please put this on a single line. Jarcec - Jarek Cecho On Aug. 1, 2013, 3:41 a.m., Hari Shreedharan wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/12936/ > ----------------------------------------------------------- > > (Updated Aug. 1, 2013, 3:41 a.m.) > > > Review request for Sqoop. > > > Bugs: SQOOP-777 > https://issues.apache.org/jira/browse/SQOOP-777 > > > Repository: sqoop-sqoop2 > > > Description > ------- > > Implemented a pluggable intermediate data format that decouples the internal > representation of the data from the connector and the output formats. > Connectors can choose to implement and support a format that is more > efficient for them. Also separated the SqoopWritable so that we can use the > intermediate data format independent of (current) Hadoop. > > I ran a full build - all tests including integration tests pass. I have not > added any new tests, yet. I will add unit tests for the new classes. Also, I > have not tried running this on an actual cluster - so things may be broken. > I'd like some initial feedback based on the current patch. > > I also implemented escaping of characters. There is some work remaining to > support binary format, but it is mostly integration, the basic implementation > is in place. > > > Diffs > ----- > > common/pom.xml db11b5b > common/src/main/java/org/apache/sqoop/etl/io/DataReader.java 3e1adc7 > common/src/main/java/org/apache/sqoop/etl/io/DataWriter.java d81364e > common/src/main/java/org/apache/sqoop/schema/type/Column.java 8b630b2 > > connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcConnector.java > e0da80f > > connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcExportInitializer.java > 7212843 > > connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcImportInitializer.java > 96818ba > > connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/util/InitializationUtils.java > PRE-CREATION > > connector/connector-generic-jdbc/src/test/java/org/apache/sqoop/connector/jdbc/TestExportLoader.java > aa1c4ff > > connector/connector-generic-jdbc/src/test/java/org/apache/sqoop/connector/jdbc/TestImportExtractor.java > a7ed6ba > connector/connector-sdk/pom.xml 4056e14 > > connector/connector-sdk/src/main/java/org/apache/sqoop/connector/CSVIntermediateDataFormat.java > PRE-CREATION > > connector/connector-sdk/src/main/java/org/apache/sqoop/connector/IntermediateDataFormat.java > PRE-CREATION > > connector/connector-sdk/src/test/java/org/apache/sqoop/connector/CSVIntermediateDataFormatTest.java > PRE-CREATION > core/src/main/java/org/apache/sqoop/framework/JobManager.java d0a087d > core/src/main/java/org/apache/sqoop/framework/SubmissionRequest.java > 53d0039 > execution/mapreduce/pom.xml f9a2a0e > > execution/mapreduce/src/main/java/org/apache/sqoop/execution/mapreduce/MapreduceExecutionEngine.java > 767080c > execution/mapreduce/src/main/java/org/apache/sqoop/job/JobConstants.java > 7fd9a01 > > execution/mapreduce/src/main/java/org/apache/sqoop/job/etl/HdfsExportExtractor.java > 1978ec6 > > execution/mapreduce/src/main/java/org/apache/sqoop/job/etl/HdfsSequenceImportLoader.java > a07c511 > > execution/mapreduce/src/main/java/org/apache/sqoop/job/etl/HdfsTextImportLoader.java > 4621942 > > execution/mapreduce/src/main/java/org/apache/sqoop/job/io/SqoopWritable.java > PRE-CREATION > > execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopFileOutputFormat.java > 356ae8a > execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopMapper.java > 59cf391 > > execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopNullOutputFormat.java > 90de6ef > > execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopOutputFormatLoadExecutor.java > 739eb17 > execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopReducer.java > b31161c > execution/mapreduce/src/test/java/org/apache/sqoop/job/JobUtils.java > e21f15b > execution/mapreduce/src/test/java/org/apache/sqoop/job/TestHdfsExtract.java > b7079dd > execution/mapreduce/src/test/java/org/apache/sqoop/job/TestHdfsLoad.java > f849aae > execution/mapreduce/src/test/java/org/apache/sqoop/job/TestMapReduce.java > 7b264c6 > > execution/mapreduce/src/test/java/org/apache/sqoop/job/io/SqoopWritableTest.java > PRE-CREATION > > execution/mapreduce/src/test/java/org/apache/sqoop/job/mr/TestSqoopOutputFormatLoadExecutor.java > bee8ab7 > pom.xml 5ea0633 > spi/pom.xml 0b240e8 > spi/src/main/java/org/apache/sqoop/connector/spi/SqoopConnector.java > 2becc56 > > submission/mapreduce/src/main/java/org/apache/sqoop/submission/mapreduce/MapreduceSubmissionEngine.java > 6fc485b > > Diff: https://reviews.apache.org/r/12936/diff/ > > > Testing > ------- > > > Thanks, > > Hari Shreedharan > >
