Matthew Jacobs has submitted this change and it was merged. Change subject: IMPALA-3376: Extra definition level when writing Parquet files ......................................................................
IMPALA-3376: Extra definition level when writing Parquet files Currently, when writing a new value to a parquet file, we write the definition level before checking if there's enough space on the current page for the value. If there isn't, we create a new page and rewrite the definition level to it, but this leaves the definition level for that value still written to the old page. To fix this, we should make sure that we have enough space to write both the definition level and the value before writing either. This patch also modifies the parquet-reader tool, which reads parquet files and performs minimal sanity checking on their metadata, to check for extra definition levels, and adds a test that runs the tool automatically. Change-Id: I20f25a90aa1ef74b4f00f38f832bc1c1853342c6 Reviewed-on: http://gerrit.cloudera.org:8080/3835 Reviewed-by: Thomas Tauber-Marshall <[email protected]> Tested-by: Internal Jenkins Reviewed-by: Matthew Jacobs <[email protected]> --- M be/src/exec/hdfs-parquet-table-writer.cc M be/src/util/parquet-reader.cc M be/src/util/rle-encoding.h M tests/query_test/test_insert_parquet.py 4 files changed, 137 insertions(+), 16 deletions(-) Approvals: Matthew Jacobs: Looks good to me, approved Internal Jenkins: Verified Thomas Tauber-Marshall: Looks good to me, but someone else must approve -- To view, visit http://gerrit.cloudera.org:8080/3835 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I20f25a90aa1ef74b4f00f38f832bc1c1853342c6 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall <[email protected]> Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Matthew Jacobs <[email protected]> Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]>
