Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3376: Extra definition level when writing Parquet files
......................................................................


Patch Set 5:

(6 comments)

The change itself looks good, I think we can simplify and improve the tests a 
bit though.

http://gerrit.cloudera.org:8080/#/c/3556/5/be/src/exec/hdfs-parquet-table-writer.cc
File be/src/exec/hdfs-parquet-table-writer.cc:

Line 380:   if (value != NULL) {
We're up to 4 levels on nesting in this function, which seems a bit high to be 
readable.  I think we should factor the retry loop out into a function, or just 
add 
if (value == NULL) break; inside the loop as before to make the control flow 
for the null case and non-null case more analogous.


Line 415:   DCHECK(ret);
Can you add a short comment to explain the DCHECK? Is the buffer_full() check 
above sufficient? Is a new page always guaranteed to have enough space to write 
a def level?


http://gerrit.cloudera.org:8080/#/c/3556/5/be/src/util/parquet-reader.cc
File be/src/util/parquet-reader.cc:

PS5, Line 172: int
Make it int32_t here and the line above to be explicit.


http://gerrit.cloudera.org:8080/#/c/3556/5/tests/query_test/test_writers.py
File tests/query_test/test_writers.py:

I'm not sure that it makes sense to have this as a separate test file. We 
already have tests/query_test/test_insert_parquet.py


Line 25: @SkipIf.not_hdfs
Is this necessary? I think we should be able to do this for all supported 
filesystems provided we construct the right URLs.

E.g. TestParquet in tests/query_test/test_scanners.py uses the hdfs command 
line utility.


Line 37:   def test_hdfs_parquet_table_writer(self, vector, unique_database):
This seems like a fairly narrowly targeted test (just a col of integers). Maybe 
rename to something like test_parquet_bigint_encoding?


-- 
To view, visit http://gerrit.cloudera.org:8080/3556
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2cafd7ef6b607ce6f815072b8af7395a892704d9
Gerrit-PatchSet: 5
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Thomas Tauber-Marshall <[email protected]>
Gerrit-Reviewer: Lars Volker <[email protected]>
Gerrit-Reviewer: Matthew Jacobs <[email protected]>
Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-HasComments: Yes

Reply via email to