[ 
https://issues.apache.org/jira/browse/IMPALA-10112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Riza Suminto resolved IMPALA-10112.
-----------------------------------
    Fix Version/s: Impala 4.0
       Resolution: Fixed

> Consider skipping FpRateTooHigh() check for bloom filters
> ---------------------------------------------------------
>
>                 Key: IMPALA-10112
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10112
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>            Reporter: Tim Armstrong
>            Assignee: Riza Suminto
>            Priority: Major
>              Labels: performance
>             Fix For: Impala 4.0
>
>
> This check disables bloom filters on the sender side.
> It is inaccurate in cases where there are duplicate values of the filter key 
> on the build side. E.g. many-to-many join or a join with multiple keys. This 
> could be fixed with some effort, but is probably not worth it, because:
> * Partition filters are probably still worth evaluating even if there are 
> false positives, because it's cheap and eliminating a partition is still 
> beneficial.
> * Runtime filters are dynamically disabled on the scan side if they are 
> ineffective. I think we still also "evaluate" the always true filter, which 
> is cheaper than doing the hashing and bloom evaluation, but still not 
> entirely free.
> * The disabling is fairly unlikely to kick in for partitioned joins because 
> it's only applied to a small subset of the filter, before the Or() operation.
> So it's potentially harmful and only likely beneficial for broadcast join 
> filters, in which case it saves a small amount of scan CPU and, for global 
> filters, coordinator RPCs and broadcasting. It's unclear that the complexity 
> is worth it for this relatively small and uncertain benefit.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to