----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24223/#review49734 -----------------------------------------------------------
src/java/org/apache/sqoop/mapreduce/ParquetOutputFormat.java <https://reviews.apache.org/r/24223/#comment87043> Could you use org.kitesdk.data.mapreduce.DatasetKeyOutputFormat instead of writing your own? - Joey Echeverria On Aug. 6, 2014, 7:56 a.m., Qian Xu wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/24223/ > ----------------------------------------------------------- > > (Updated Aug. 6, 2014, 7:56 a.m.) > > > Review request for Sqoop. > > > Repository: sqoop-trunk > > > Description > ------- > > The patch proposes to add the possibility to import an individual table from > a RDBMS into HDFS as a set of Parquet files. It also supports a command-line > interface with a new argument `--as-parquetfile` > Example invocation: `sqoop import --connect JDBC_URI --table TABLE > --as-parquetfile --target-dir /path/to/files` > > The major items are listed as follows: > *Implement `ParquetImportMapper` > *Hook up the `ParquetOutputFormat` and `ParquetImportMapper` in the import > job. > > As Parquet is a columnar storage format, it doesn't make sense to write to it > directly from record-based tools. We've considered of using Kite SDK to > simplify the handling of Parquet specific things. The major idea is to > convert `SqoopRecord` as `GenericRecord` and write them into a Kite dataset. > Kite SDK will convert these records to as a set of Parquet files. > > > Diffs > ----- > > ivy.xml abc12a1 > ivy/libraries.properties a59471e > src/docs/man/import-args.txt a4ce4ec > src/docs/man/sqoop-import-all-tables.txt 6b639f5 > src/docs/user/hcatalog.txt cd1dde3 > src/docs/user/help.txt a9e1e89 > src/docs/user/import-all-tables.txt 60645f1 > src/docs/user/import.txt 192e97e > src/java/com/cloudera/sqoop/SqoopOptions.java ffec2dc > src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java 6dcfebb > src/java/org/apache/sqoop/mapreduce/ParquetImportMapper.java PRE-CREATION > src/java/org/apache/sqoop/mapreduce/ParquetJob.java PRE-CREATION > src/java/org/apache/sqoop/mapreduce/ParquetOutputFormat.java PRE-CREATION > src/java/org/apache/sqoop/tool/BaseSqoopTool.java b77b1ea > src/java/org/apache/sqoop/tool/ImportTool.java a3a2d0d > src/licenses/LICENSE-BIN.txt 4215d26 > src/test/com/cloudera/sqoop/TestParquetImport.java PRE-CREATION > > Diff: https://reviews.apache.org/r/24223/diff/ > > > Testing > ------- > > Manually tested with a MySQL database. Unit tests are being developed yet. > > > Thanks, > > Qian Xu > >
