Michael Ho has uploaded a new change for review. http://gerrit.cloudera.org:8080/4064
Change subject: IMPALA-3662: Don't double allocate tuples' buffer in parquet scanner ...................................................................... IMPALA-3662: Don't double allocate tuples' buffer in parquet scanner HdfsScanner::StartNewRowBatch() is called once per row batch by the parquet scanner to allocate a new row batch and tuple buffer. Similarly, a scratch batch is created for each row batch in HdfsParquetScanner::AssembleRows() which also contains the tuple buffer. In reality, only the tuple buffer in the scratch batch is used. So, the tuple buffer allocated by HdfsScanner::StartNewRowBatch() is unused memory for the parquet scanner. This change fixes the problem above by implementing HdfsParquetScanner::StartNewRowBatch() which creates a new row batch without allocating the tuple buffer. With this patch, the memory consumption when materializing very wide tuples is reduced by half. Change-Id: I826061a2be10fd0528ca4dd1e97146e3cb983370 --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-scanner.h 3 files changed, 14 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/64/4064/1 -- To view, visit http://gerrit.cloudera.org:8080/4064 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I826061a2be10fd0528ca4dd1e97146e3cb983370 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho <[email protected]>
