Tim Armstrong created IMPALA-10112:
--------------------------------------

             Summary: Consider skipping FpRateTooHigh() check for bloom filters
                 Key: IMPALA-10112
                 URL: https://issues.apache.org/jira/browse/IMPALA-10112
             Project: IMPALA
          Issue Type: Improvement
          Components: Backend
            Reporter: Tim Armstrong
            Assignee: Tim Armstrong


This check disables bloom filters on the sender side.

It is inaccurate in cases where there are duplicate values of the filter key on 
the build side. E.g. many-to-many join or a join with multiple keys. This could 
be fixed with some effort, but is probably not worth it, because:
* Partition filters are probably still worth evaluating even if there are false 
positives, because it's cheap and eliminating a partition is still beneficial.
* Runtime filters are dynamically disabled on the scan side if they are 
ineffective.
* The disabling is fairly unlikely to kick in for partitioned joins because 
it's only applied to a small subset of the filter, before the Or() operation.

So it's potentially harmful and only likely beneficial for broadcast join 
filters, in which case it saves a small amount of scan CPU and, for global 
filters, coordinator RPCs and broadcasting. It's unclear that the complexity is 
worth it for this relatively small and uncertain benefit.





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to