Csaba Ringhofer created IMPALA-13225:
----------------------------------------

             Summary: Tuple deduplication does not work in partitioned exchanges
                 Key: IMPALA-13225
                 URL: https://issues.apache.org/jira/browse/IMPALA-13225
             Project: IMPALA
          Issue Type: Improvement
          Components: Backend
            Reporter: Csaba Ringhofer


RowBatch::Serialize() has a deduplication logic that detects duplicate tuples 
(usually the result of joins) based on tuple pointers. This doesn't work in 
partitioned exchanges because all rows are deep copied one-by-one when 
collecting rows for a given channel, so all tuple pointers will be distinct:
https://github.com/apache/impala/blob/d83b48cf72fa94ec7f6e55da409b4dff3350543b/be/src/runtime/krpc-data-stream-sender.cc#L645

The deduplication was added a long time ago (doesn't have a Jira):
https://gerrit.cloudera.org/#/c/573/
I am not sure if it ever worked in the partitioned case (it should work though 
in broadcast exchanges).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to