Michael Ho has uploaded a new patch set (#2). Change subject: IMPALA-3662: Don't double allocate tuples buffer in parquet scanner ......................................................................
IMPALA-3662: Don't double allocate tuples buffer in parquet scanner HdfsScanner::StartNewRowBatch() is called once per row batch by the parquet scanner to allocate a new row batch and tuple buffer. Similarly, a scratch batch is created for each row batch in HdfsParquetScanner::AssembleRows() which also contains the tuple buffer. In reality, only the tuple buffer in the scratch batch is used. So, the tuple buffer allocated by HdfsScanner::StartNewRowBatch() is unused memory for the parquet scanner. This change fixes the problem above by implementing HdfsParquetScanner::StartNewRowBatch() which creates a new row batch without allocating the tuple buffer. With this patch, the memory consumption when materializing very wide tuples is reduced by half. Change-Id: I826061a2be10fd0528ca4dd1e97146e3cb983370 --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-scanner.h 3 files changed, 14 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/64/4064/2 -- To view, visit http://gerrit.cloudera.org:8080/4064 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I826061a2be10fd0528ca4dd1e97146e3cb983370 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]>
