Tim Armstrong has uploaded a new patch set (#2).

Change subject: IMPALA-3105: rework handling of tuple buffer sizing in RowBatch
......................................................................

IMPALA-3105: rework handling of tuple buffer sizing in RowBatch

RowBatch::MaxTupleBufferSize() tried to estimate the maximum number of
rows that would fit in a batch based on the soft capacity memory limit
of batches. The logic was wrong because the memory capacity can be
exceeded, either because exec nodes do not check capacity, or because
the limit is checked after adding a row, not before.

Instead in this patch we achieve the same goal by setting the hard
RowBatch::capacity_ limit to a value that keeps the total fixed-length
data for a row batch below a cap (unless a single row would exceed that
cap, in which case it can't be avoided). This avoids corner cases where
the old MaxTupleBufferSize() calculation may have led to buffer overruns
and simplifies the logic.

Change-Id: Idfd9cd681875821c1c379d97586d3f4850aae622
---
M be/src/exec/data-source-scan-node.cc
M be/src/exec/hbase-scan-node.cc
M be/src/exec/hbase-scan-node.h
M be/src/exec/union-node.cc
M be/src/runtime/buffered-tuple-stream-test.cc
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
7 files changed, 53 insertions(+), 52 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/73/2473/2
-- 
To view, visit http://gerrit.cloudera.org:8080/2473
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idfd9cd681875821c1c379d97586d3f4850aae622
Gerrit-PatchSet: 2
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong <[email protected]>

Reply via email to