----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24223/#review49742 -----------------------------------------------------------
I agree with Joey, it would be better to use the DatasetKeyOutputFormat so you don't have to maintain one. You might also consider implementing a wrapper for SqoopRecord that implements GenericRecord [1]. That would remove the need to copy the values from one map to the other. [1]: http://avro.apache.org/docs/1.7.6/api/java/org/apache/avro/generic/GenericRecord.html - Ryan Blue On Aug. 6, 2014, 12:56 a.m., Qian Xu wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/24223/ > ----------------------------------------------------------- > > (Updated Aug. 6, 2014, 12:56 a.m.) > > > Review request for Sqoop. > > > Repository: sqoop-trunk > > > Description > ------- > > The patch proposes to add the possibility to import an individual table from > a RDBMS into HDFS as a set of Parquet files. It also supports a command-line > interface with a new argument `--as-parquetfile` > Example invocation: `sqoop import --connect JDBC_URI --table TABLE > --as-parquetfile --target-dir /path/to/files` > > The major items are listed as follows: > *Implement `ParquetImportMapper` > *Hook up the `ParquetOutputFormat` and `ParquetImportMapper` in the import > job. > > As Parquet is a columnar storage format, it doesn't make sense to write to it > directly from record-based tools. We've considered of using Kite SDK to > simplify the handling of Parquet specific things. The major idea is to > convert `SqoopRecord` as `GenericRecord` and write them into a Kite dataset. > Kite SDK will convert these records to as a set of Parquet files. > > > Diffs > ----- > > ivy.xml abc12a1 > ivy/libraries.properties a59471e > src/docs/man/import-args.txt a4ce4ec > src/docs/man/sqoop-import-all-tables.txt 6b639f5 > src/docs/user/hcatalog.txt cd1dde3 > src/docs/user/help.txt a9e1e89 > src/docs/user/import-all-tables.txt 60645f1 > src/docs/user/import.txt 192e97e > src/java/com/cloudera/sqoop/SqoopOptions.java ffec2dc > src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java 6dcfebb > src/java/org/apache/sqoop/mapreduce/ParquetImportMapper.java PRE-CREATION > src/java/org/apache/sqoop/mapreduce/ParquetJob.java PRE-CREATION > src/java/org/apache/sqoop/mapreduce/ParquetOutputFormat.java PRE-CREATION > src/java/org/apache/sqoop/tool/BaseSqoopTool.java b77b1ea > src/java/org/apache/sqoop/tool/ImportTool.java a3a2d0d > src/licenses/LICENSE-BIN.txt 4215d26 > src/test/com/cloudera/sqoop/TestParquetImport.java PRE-CREATION > > Diff: https://reviews.apache.org/r/24223/diff/ > > > Testing > ------- > > Manually tested with a MySQL database. Unit tests are being developed yet. > > > Thanks, > > Qian Xu > >
