Christopher Channing has uploaded a new patch set (#2). Change subject: IMPALA-3964: Remove erroneous DCHECK in Parquet scanner. ......................................................................
IMPALA-3964: Remove erroneous DCHECK in Parquet scanner. The Bug: Prior to this patch, a DCHECK was used to verify that the underlying memory pool for the scratch batch was empty in a count based scenario. For IMPALA-3964 (where a count(*) is performed over a nested collection), if a Parquet column chunk is compressed, upon reading each new data page it would be decompressed and eventually placed in to the underlying scratch batch memory pool causing the aforementioned DCHECK to fail. This was not picked up in the test suite as the TPCH nested Parquet data is not compressed. The Fix: Removed the erroneous DCHECK. Added logic to determine if any remaining tuple pointers need to be moved to the destination row batch. Augmented the load_nested.py script to snappy compress each of the tables within the 'tpch_nested_parquet' database. Note, no new tests are needed as there are already a number of existing functional nested Parquet tests that cover this scenario. Change-Id: Id0955c85d18dfba4bd29a35ec95d0355da050607 --- M be/src/exec/hdfs-parquet-scanner.cc M testdata/bin/load_nested.py 2 files changed, 8 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/3940/2 -- To view, visit http://gerrit.cloudera.org:8080/3940 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id0955c85d18dfba4bd29a35ec95d0355da050607 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Christopher Channing <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]>
