Internal Jenkins has submitted this change and it was merged. Change subject: IMPALA-3662: Don't double allocate tuples buffer in parquet scanner ......................................................................
IMPALA-3662: Don't double allocate tuples buffer in parquet scanner HdfsScanner::StartNewRowBatch() is called once per row batch by the parquet scanner to allocate a new row batch and tuple buffer. Similarly, a scratch batch is created for each row batch in HdfsParquetScanner::AssembleRows() which also contains the tuple buffer. In reality, only the tuple buffer in the scratch batch is used. So, the tuple buffer allocated by HdfsScanner::StartNewRowBatch() is unused memory for the parquet scanner. This change fixes the problem above by implementing HdfsParquetScanner::StartNewRowBatch() which creates a new row batch without allocating the tuple buffer. With this patch, the memory consumption when materializing very wide tuples is reduced by half. Change-Id: I826061a2be10fd0528ca4dd1e97146e3cb983370 Reviewed-on: http://gerrit.cloudera.org:8080/4064 Reviewed-by: Michael Ho <[email protected]> Tested-by: Internal Jenkins --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-scanner.h 3 files changed, 14 insertions(+), 3 deletions(-) Approvals: Michael Ho: Looks good to me, approved Internal Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/4064 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I826061a2be10fd0528ca4dd1e97146e3cb983370 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]>
