Tim Armstrong has posted comments on this change. Change subject: IMPALA-3376: Extra definition level when writing Parquet files ......................................................................
Patch Set 5: (6 comments) The change itself looks good, I think we can simplify and improve the tests a bit though. http://gerrit.cloudera.org:8080/#/c/3556/5/be/src/exec/hdfs-parquet-table-writer.cc File be/src/exec/hdfs-parquet-table-writer.cc: Line 380: if (value != NULL) { We're up to 4 levels on nesting in this function, which seems a bit high to be readable. I think we should factor the retry loop out into a function, or just add if (value == NULL) break; inside the loop as before to make the control flow for the null case and non-null case more analogous. Line 415: DCHECK(ret); Can you add a short comment to explain the DCHECK? Is the buffer_full() check above sufficient? Is a new page always guaranteed to have enough space to write a def level? http://gerrit.cloudera.org:8080/#/c/3556/5/be/src/util/parquet-reader.cc File be/src/util/parquet-reader.cc: PS5, Line 172: int Make it int32_t here and the line above to be explicit. http://gerrit.cloudera.org:8080/#/c/3556/5/tests/query_test/test_writers.py File tests/query_test/test_writers.py: I'm not sure that it makes sense to have this as a separate test file. We already have tests/query_test/test_insert_parquet.py Line 25: @SkipIf.not_hdfs Is this necessary? I think we should be able to do this for all supported filesystems provided we construct the right URLs. E.g. TestParquet in tests/query_test/test_scanners.py uses the hdfs command line utility. Line 37: def test_hdfs_parquet_table_writer(self, vector, unique_database): This seems like a fairly narrowly targeted test (just a col of integers). Maybe rename to something like test_parquet_bigint_encoding? -- To view, visit http://gerrit.cloudera.org:8080/3556 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2cafd7ef6b607ce6f815072b8af7395a892704d9 Gerrit-PatchSet: 5 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Thomas Tauber-Marshall <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Matthew Jacobs <[email protected]> Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
