> On Ene. 27, 2015, 1:59 a.m., cheng xu wrote: > > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java, > > line 130 > > <https://reviews.apache.org/r/30281/diff/1/?file=834396#file834396line130> > > > > Why remove compressionType code here?
I removed the code because it was unused. The compressionType variable is private and it is not used on any other part of the code. Maybe it was part of a change a developer wanted to do, but it did not finished it. > On Ene. 27, 2015, 1:59 a.m., cheng xu wrote: > > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java, > > line 70 > > <https://reviews.apache.org/r/30281/diff/1/?file=834398#file834398line70> > > > > Why not define writeGroupFields with a parameter of ParquetWritable > > instead of parsing in object and objectInspector seperatedly? I did that at the beginning, but I had to create another ParquetWritable() object everytime I called writeGroup() method. So I wanted to save time on memory allocation as writeGroup() is called many times if you have a STRUCT data type in your schema. > On Ene. 27, 2015, 1:59 a.m., cheng xu wrote: > > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java, > > lines 225-229 > > <https://reviews.apache.org/r/30281/diff/1/?file=834398#file834398line225> > > > > Assume if i%2 equals 0, it means the key. And only the key's value is > > not null, we'll write the value. What if comes a null value for both the > > key and value? Can we use the way like the original way that pass in the > > writable object and handle the null value case in the writeValue method. > > The code can become more simple and easy to understand. Thanks. I did the necessary changes to move the startField/endField to other methods in order to make the code more clrear and readable. - Sergio ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30281/#review69723 ----------------------------------------------------------- On Ene. 27, 2015, 1:39 a.m., Sergio Pena wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/30281/ > ----------------------------------------------------------- > > (Updated Ene. 27, 2015, 1:39 a.m.) > > > Review request for hive, Ryan Blue, cheng xu, and Dong Chen. > > > Bugs: HIVE-9333 > https://issues.apache.org/jira/browse/HIVE-9333 > > > Repository: hive-git > > > Description > ------- > > This patch moves the ParquetHiveSerDe.serialize() implementation to > DataWritableWriter class in order to save time in materializing data on > serialize(). > > > Diffs > ----- > > > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetOutputFormat.java > ea4109d358f7c48d1e2042e5da299475de4a0a29 > > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java > 9caa4ed169ba92dbd863e4a2dc6d06ab226a4465 > > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriteSupport.java > 060b1b722d32f3b2f88304a1a73eb249e150294b > > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java > 41b5f1c3b0ab43f734f8a211e3e03d5060c75434 > > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/ParquetRecordWriterWrapper.java > e52c4bc0b869b3e60cb4bfa9e11a09a0d605ac28 > > ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestDataWritableWriter.java > a693aff18516d133abf0aae4847d3fe00b9f1c96 > > ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestMapredParquetOutputFormat.java > 667d3671547190d363107019cd9a2d105d26d336 > ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetSerDe.java > 007a665529857bcec612f638a157aa5043562a15 > serde/src/java/org/apache/hadoop/hive/serde2/io/ParquetWritable.java > PRE-CREATION > > Diff: https://reviews.apache.org/r/30281/diff/ > > > Testing > ------- > > The tests run were the following: > > 1. JMH (Java microbenchmark) > > This benchmark called parquet serialize/write methods using text writable > objects. > > Class.method Before Change (ops/s) After Change (ops/s) > > ------------------------------------------------------------------------------- > ParquetHiveSerDe.serialize: 19,113 249,528 -> > 19x speed increase > DataWritableWriter.write: 5,033 5,201 -> > 3.34% speed increase > > > 2. Write 20 million rows (~1GB file) from Text to Parquet > > I wrote a ~1Gb file in Textfile format, then convert it to a Parquet format > using the following > statement: CREATE TABLE parquet STORED AS parquet AS SELECT * FROM text; > > Time (s) it took to write the whole file BEFORE changes: 93.758 s > Time (s) it took to write the whole file AFTER changes: 83.903 s > > It got a 10% of speed inscrease. > > > Thanks, > > Sergio Pena > >