[ 
https://issues.apache.org/jira/browse/IMPALA-10110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17187943#comment-17187943
 ] 

ASF subversion and git services commented on IMPALA-10110:
----------------------------------------------------------

Commit ea75e68f9e07d9bec211adea2c9d0e0dd0f9c3bc in impala's branch 
refs/heads/master from Tim Armstrong
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=ea75e68 ]

IMPALA-10110: bloom filter target fpp query option

This adds a BLOOM_FILTER_ERROR_RATE option that takes a
value between 0 and 1 (exclusive) that can override
the default target false positive probability (fpp)
value of 0.75 for selecting the filter size.

It does not affect whether filters are disabled
at runtime.

Adds estimated FPP and bloom size to the routing
table so we have some observability. Here is an
example:

tpch_kudu> select count(*) from customer join nation on n_nationkey = 
c_nationkey;

 ID  Src. Node  Tgt. Node(s)  Target type  Partition filter  Pending (Expected) 
 First arrived  Completed  Enabled  Bloom Size    Est fpp
-----------------------------------------------------------------------------------------------------------------------------------------
  1          2             0        LOCAL             false               0 (3) 
           N/A        N/A     true     MIN_MAX
  0          2             0        LOCAL             false               0 (3) 
           N/A        N/A     true     1.00 MB   1.04e-37

Testing:
Added a test that shows the query option affecting filter size.

Ran core tests.

Change-Id: Ifb123a0ea1e0e95d95df9837c1f0222fd60361f3
Reviewed-on: http://gerrit.cloudera.org:8080/16377
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Separate option to control fpp for bloom filter sizing
> ------------------------------------------------------
>
>                 Key: IMPALA-10110
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10110
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>            Reporter: Tim Armstrong
>            Assignee: Tim Armstrong
>            Priority: Major
>
> Before IMPALA-6311, we should decouple the ideal target FPP used for sizing 
> the filters from --max_filter_error_rate, which is used to discard filters.
> We could add a query option like RUNTIME_FILTER_FPP



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to