----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30281/#review69935 -----------------------------------------------------------
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java <https://reviews.apache.org/r/30281/#comment114798> Hi Sergio, I am a little confused about the purpose of pushing startFiled& endField down. As the method name "writeGroupFields" indicates, it will write fields of group one by one. My suggestion is moving back these two lines. If I missed anything, please tell me your consideration about this change. ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java <https://reviews.apache.org/r/30281/#comment114774> Could you add an annotation for index here? ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java <https://reviews.apache.org/r/30281/#comment114775> The same as above. ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java <https://reviews.apache.org/r/30281/#comment114796> The same as above. ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java <https://reviews.apache.org/r/30281/#comment114799> No need for the null value check. ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java <https://reviews.apache.org/r/30281/#comment114800> How about the following code snippet? recordConsummer.startField(fieldName, i); if(i % 2 == 0){ writeValue(keyElement, KeyInspector, fieldType); }else{ writeValue(valueElement, valueInspector, fieldType); } recordConsumer.endField(fieldName, i); ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestDataWritableWriter.java <https://reviews.apache.org/r/30281/#comment114801> Could we use import statement for the ByteWritable package at the beginning? ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestDataWritableWriter.java <https://reviews.apache.org/r/30281/#comment114802> the same as above. serde/src/java/org/apache/hadoop/hive/serde2/io/ParquetWritable.java <https://reviews.apache.org/r/30281/#comment114803> Better to cast it in the places it used since it is set by ObjectInspector object. Hi Sergio, thank you for your changes. I have a few new comments left. - cheng xu On Jan. 27, 2015, 6:47 p.m., Sergio Pena wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/30281/ > ----------------------------------------------------------- > > (Updated Jan. 27, 2015, 6:47 p.m.) > > > Review request for hive, Ryan Blue, cheng xu, and Dong Chen. > > > Bugs: HIVE-9333 > https://issues.apache.org/jira/browse/HIVE-9333 > > > Repository: hive-git > > > Description > ------- > > This patch moves the ParquetHiveSerDe.serialize() implementation to > DataWritableWriter class in order to save time in materializing data on > serialize(). > > > Diffs > ----- > > > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetOutputFormat.java > ea4109d358f7c48d1e2042e5da299475de4a0a29 > > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java > 9caa4ed169ba92dbd863e4a2dc6d06ab226a4465 > > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriteSupport.java > 060b1b722d32f3b2f88304a1a73eb249e150294b > > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java > 41b5f1c3b0ab43f734f8a211e3e03d5060c75434 > > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/ParquetRecordWriterWrapper.java > e52c4bc0b869b3e60cb4bfa9e11a09a0d605ac28 > > ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestDataWritableWriter.java > a693aff18516d133abf0aae4847d3fe00b9f1c96 > > ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestMapredParquetOutputFormat.java > 667d3671547190d363107019cd9a2d105d26d336 > ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetSerDe.java > 007a665529857bcec612f638a157aa5043562a15 > serde/src/java/org/apache/hadoop/hive/serde2/io/ParquetWritable.java > PRE-CREATION > > Diff: https://reviews.apache.org/r/30281/diff/ > > > Testing > ------- > > The tests run were the following: > > 1. JMH (Java microbenchmark) > > This benchmark called parquet serialize/write methods using text writable > objects. > > Class.method Before Change (ops/s) After Change (ops/s) > > ------------------------------------------------------------------------------- > ParquetHiveSerDe.serialize: 19,113 249,528 -> > 19x speed increase > DataWritableWriter.write: 5,033 5,201 -> > 3.34% speed increase > > > 2. Write 20 million rows (~1GB file) from Text to Parquet > > I wrote a ~1Gb file in Textfile format, then convert it to a Parquet format > using the following > statement: CREATE TABLE parquet STORED AS parquet AS SELECT * FROM text; > > Time (s) it took to write the whole file BEFORE changes: 93.758 s > Time (s) it took to write the whole file AFTER changes: 83.903 s > > It got a 10% of speed inscrease. > > > Thanks, > > Sergio Pena > >