[jira] [Updated] (IMPALA-13225) Tuple deduplication does not work in partitioned exchanges

Csaba Ringhofer (Jira) Mon, 15 Jul 2024 23:42:03 -0700


     [ 
https://issues.apache.org/jira/browse/IMPALA-13225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Csaba Ringhofer updated IMPALA-13225:
-------------------------------------
    Labels: performance  (was: )

> Tuple deduplication does not work in partitioned exchanges
> ----------------------------------------------------------
>
>                 Key: IMPALA-13225
>                 URL: https://issues.apache.org/jira/browse/IMPALA-13225
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>            Reporter: Csaba Ringhofer
>            Priority: Major
>              Labels: performance
>
> RowBatch::Serialize() has a deduplication logic that detects duplicate tuples 
> (usually the result of joins) based on tuple pointers. This doesn't work in 
> partitioned exchanges because all rows are deep copied one-by-one when 
> collecting rows for a given channel, so all tuple pointers will be distinct:
> https://github.com/apache/impala/blob/d83b48cf72fa94ec7f6e55da409b4dff3350543b/be/src/runtime/krpc-data-stream-sender.cc#L645
> The deduplication was added a long time ago (doesn't have a Jira):
> https://gerrit.cloudera.org/#/c/573/
> I am not sure if it ever worked in the partitioned case (it should work 
> though in broadcast exchanges).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (IMPALA-13225) Tuple deduplication does not work in partitioned exchanges

Reply via email to