Thomas Tauber-Marshall has uploaded a new patch set (#2). Change subject: IMPALA-3376: Extra definition level when writing Parquet files ......................................................................
IMPALA-3376: Extra definition level when writing Parquet files Currently, when writing a new value to a parquet file, we write the definition level before checking if there's enough space on the current page for the value. If there isn't, we create a new page and rewrite the definition level to it, but this leaves the definition level for that value still written to the old page. To fix this, we should make sure that we have enough space to write both the definition level and the value before writing either. This patch also modifies the parquet-reader tool, which reads parquet files and performs minimal sanity checking on their metadata, to check for extra definition levels, and adds a test that runs the tool automatically. Change-Id: I2cafd7ef6b607ce6f815072b8af7395a892704d9 --- M be/src/exec/hdfs-parquet-table-writer.cc M be/src/util/parquet-reader.cc M be/src/util/rle-encoding.h M tests/common/skip.py A tests/query_test/test_writers.py 5 files changed, 165 insertions(+), 36 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/56/3556/2 -- To view, visit http://gerrit.cloudera.org:8080/3556 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2cafd7ef6b607ce6f815072b8af7395a892704d9 Gerrit-PatchSet: 2 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Thomas Tauber-Marshall <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]>
