Csaba Ringhofer created IMPALA-13306:
----------------------------------------
Summary: Store resources attached to row batches per-tuple
descriptor
Key: IMPALA-13306
URL: https://issues.apache.org/jira/browse/IMPALA-13306
Project: IMPALA
Issue Type: Improvement
Components: Backend
Reporter: Csaba Ringhofer
Currently RowBatch handles resource related info (e.g. FlushMode) globally
while it may be different for each tuple descriptor.
An example is a row that comes from a join that didn't spill. In this case the
memory of the build side tuple remains valid until the join node is closed,
while the probe side can change more often, e.g. when the scratch batch in the
Parquet scanner gets full and is attached to the row batch.
Some operators could benefit from knowing that some tuple pointers remain valid
fog longer. An example is tuple deduplication KrpcDataStream sender - if more
than one row batches could be sent in a single OutboundRowBatch, then it would
be important to know which whether the same tuple pointer really means the same
tuple in the new RowBatch.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)