sascha-coenen opened a new issue #9413: SQL: BLOOM_FILTER() does not work on 
constant values fed in via sub query
URL: https://github.com/apache/druid/issues/9413
 
 
   ### Affected Version
   0.16.0 and 0.17.0
   
   ### Description
   Sometimes users need to feed large lists of tokens to an IN filter which 
slows down queries, so I was trying to rewrite such queries so that a bloom 
filter would capture the filter conditions.
   I found out that the bloom_filter() function only seems to work if applied 
directly to a table column.
   However, if constant values from a subquery get passed in, the SQL query 
cannot be transformed into a native JSON query due to rule exceptions.
   
   The following set of queries demonstrate at which point things break:
   
   `VALUES (1,2,3), (4,5,6), (7,8,9)`
   works
   
   `VALUES 'a', 'b', 'c'`
   works
   
   `SELECT blacklist.item FROM (VALUES 'a', 'b', 'c') AS blacklist (item)`
   works
   
   `SELECT COUNT(blacklist.item) FROM (VALUES 'a', 'b', 'c') AS blacklist 
(item)`
   works
   
   `SELECT BLOOM_FILTER(blacklist.item, 100) FROM (VALUES 'a', 'b', 'c') AS 
blacklist (item)`
   fails with the exception below.
   This is stange because the above query succeeds although the only difference 
is that another aggregation function got used, namely COUNT instead of 
BLOOM_FILTER but the structure of both queries is identical.
   
   The following works fine:
   
   `SELECT BLOOM_FILTER(CountryCode, 100) FROM revenue_statistics WHERE __time 
> TIMESTAMP '2019-01-01 00:00:00'`
   works
   
   exception:
   `SQL Error [00000]: Error -1 (00000) : Error while executing SQL "SELECT 
BLOOM_FILTER(blacklist.item, 2) FROM (VALUES 'a', 'b', 'c') AS blacklist 
(item)": Remote driver error: RuntimeException: 
org.apache.calcite.plan.RelOptPlanner$CannotPlanException: There are not enough 
rules to produce a node with desired properties: convention=DRUID, 
sort=[].¶Missing conversion is LogicalAggregate[convention: NONE -> 
DRUID]¶There is 1 empty subset: rel#519:Subset#2.DRUID.[], the relevant part of 
the original plan is as follows¶517:LogicalAggregate(group=[{}], 
EXPR$0=[BLOOM_FILTER($0, $1)])¶  
515:LogicalProject(subset=[rel#516:Subset#1.NONE.[0]], item=[$0], $f1=[2])¶    
506:LogicalValues(subset=[rel#514:Subset#0.NONE.[0]], tuples=[[{ 'a' }, { 'b' 
}, { 'c' }]])¶¶Root: rel#519:Subset#2.DRUID.[]¶Original 
rel:¶LogicalAggregate(subset=[rel#519:Subset#2.DRUID.[]], group=[{}], 
EXPR$0=[BLOOM_FILTER($0, $1)]): rowcount = 1.0, cumulative cost = {1.125 rows, 
0.0 cpu, 0.0 io}, id = 517¶  LogicalProject(subset=[rel#516:Subset#1.NONE.[0]], 
item=[$0], $f1=[2]): rowcount = 3.0, cumulative cost = {3.0 rows, 6.0 cpu, 0.0 
io}, id = 515¶    LogicalValues(subset=[rel#514:Subset#0.NONE.[0]], tuples=[[{ 
'a' }, { 'b' }, { 'c' }]]): rowcount = 3.0, cumulative cost = {3.0 rows, 1.0 
cpu, 0.0 io}, id = 506¶¶Sets:¶Set#0, type: RecordType(CHAR(1) item)¶ 
rel#514:Subset#0.NONE.[0], best=null, importance=0.7290000000000001¶  
rel#506:LogicalValues.NONE.[0](type=RecordType(CHAR(1) item),tuples=[{ 'a' }, { 
'b' }, { 'c' }]), rowcount=3.0, cumulative cost={inf}¶Set#1, type: 
RecordType(CHAR(1) item, INTEGER $f1)¶ rel#516:Subset#1.NONE.[0], best=null, 
importance=0.81¶  
rel#515:LogicalProject.NONE.[0](input=RelSubset#514,item=$0,$f1=2), 
rowcount=3.0, cumulative cost={inf}¶  rel#521:LogicalValues.NONE.[[0, 1], 
[1]](type=RecordType(CHAR(1) item, INTEGER $f1),tuples=[{ 'a', 2 }, { 'b', 2 }, 
{ 'c', 2 }]), rowcount=3.0, cumulative cost={inf}¶ rel#522:Subset#1.NONE.[], 
best=null, importance=0.405¶  
rel#515:LogicalProject.NONE.[0](input=RelSubset#514,item=$0,$f1=2), 
rowcount=3.0, cumulative cost={inf}¶  rel#521:LogicalValues.NONE.[[0, 1], 
[1]](type=RecordType(CHAR(1) item, INTEGER $f1),tuples=[{ 'a', 2 }, { 'b', 2 }, 
{ 'c', 2 }]), rowcount=3.0, cumulative cost={inf}¶Set#2, type: RecordType(OTHER 
EXPR$0)¶ rel#518:Subset#2.NONE.[], best=null, importance=0.9¶  
rel#517:LogicalAggregate.NONE.[](input=RelSubset#516,group={},EXPR$0=BLOOM_FILTER($0,
 $1)), rowcount=1.0, cumulative cost={inf}¶ rel#519:Subset#2.DRUID.[], 
best=null, importance=1.0¶  
rel#520:AbstractConverter.DRUID.[](input=RelSubset#518,convention=DRUID,sort=[]),
 rowcount=1.0, cumulative cost={inf}¶¶Graphviz:¶digraph G {¶ root 
[style=filled,label="Root"];¶ subgraph cluster0{¶  label="Set 0 
RecordType(CHAR(1) item)";¶  rel506 
[label="rel#506:LogicalValues\ntype=RecordType(CHAR(1) item),tuples=[{ 'a' }, { 
'b' }, { 'c' }]\nrows=3.0, cost={inf}",shape=box]¶  subset514 
[label="rel#514:Subset#0.NONE.[0]"]¶ }¶ subgraph cluster1{¶  label="Set 1 
RecordType(CHAR(1) item, INTEGER $f1)";¶  rel515 
[label="rel#515:LogicalProject\ninput=RelSubset#514,item=$0,$f1=2\nrows=3.0, 
cost={inf}",shape=box]¶  rel521 [label="rel#521:LogicalValues.NONE.[[0, 1], 
[1]]\ntype=RecordType(CHAR(1) item, INTEGER $f1),tuples=[{ 'a', 2 }, { 'b', 2 
}, { 'c', 2 }]\nrows=3.0, cost={inf}",shape=box]¶  subset516 
[label="rel#516:Subset#1.NONE.[0]"]¶  subset522 
[label="rel#522:Subset#1.NONE.[]"]¶  subset522 -> subset516; }¶ subgraph 
cluster2{¶  label="Set 2 RecordType(OTHER EXPR$0)";¶  rel517 
[label="rel#517:LogicalAggregate\ninput=RelSubset#516,group={},EXPR$0=BLOOM_FILTER($0,
 $1)\nrows=1.0, cost={inf}",shape=box]¶  rel520 
[label="rel#520:AbstractConverter\ninput=RelSubset#518,convention=DRUID,sort=[]\nrows=1.0,
 cost={inf}",shape=box]¶  subset518 [label="rel#518:Subset#2.NONE.[]"]¶  
subset519 [label="rel#519:Subset#2.DRUID.[]",color=red]¶ }¶ root -> subset519;¶ 
subset514 -> rel506;¶ subset516 -> rel515; rel515 -> subset514;¶ subset522 -> 
rel521;¶ subset518 -> rel517; rel517 -> subset516;¶ subset519 -> rel520; rel520 
-> subset518;¶} -> CannotPlanException: There are not enough rules to produce a 
node with desired properties: convention=DRUID, sort=[].¶Missing conversion is 
LogicalAggregate[convention: NONE -> DRUID]¶There is 1 empty subset: 
rel#519:Subset#2.DRUID.[], the relevant part of the original plan is as 
follows¶517:LogicalAggregate(group=[{}], EXPR$0=[BLOOM_FILTER($0, $1)])¶  
515:LogicalProject(subset=[rel#516:Subset#1.NONE.[0]], item=[$0], $f1=[2])¶    
506:LogicalValues(subset=[rel#514:Subset#0.NONE.[0]], tuples=[[{ 'a' }, { 'b' 
}, { 'c' }]])¶¶Root: rel#519:Subset#2.DRUID.[]¶Original 
rel:¶LogicalAggregate(subset=[rel#519:Subset#2.DRUID.[]], group=[{}], 
EXPR$0=[BLOOM_FILTER($0, $1)]): rowcount = 1.0, cumulative cost = {1.125 rows, 
0.0 cpu, 0.0 io}, id = 517¶  LogicalProject(subset=[rel#516:Subset#1.NONE.[0]], 
item=[$0], $f1=[2]): rowcount = 3.0, cumulative cost = {3.0 rows, 6.0 cpu, 0.0 
io}, id = 515¶    LogicalValues(subset=[rel#514:Subset#0.NONE.[0]], tuples=[[{ 
'a' }, { 'b' }, { 'c' }]]): rowcount = 3.0, cumulative cost = {3.0 rows, 1.0 
cpu, 0.0 io}, id = 506¶¶Sets:¶Set#0, type: RecordType(CHAR(1) item)¶ 
rel#514:Subset#0.NONE.[0], best=null, importance=0.7290000000000001¶  
rel#506:LogicalValues.NONE.[0](type=RecordType(CHAR(1) item),tuples=[{ 'a' }, { 
'b' }, { 'c' }]), rowcount=3.0, cumulative cost={inf}¶Set#1, type: 
RecordType(CHAR(1) item, INTEGER $f1)¶ rel#516:Subset#1.NONE.[0], best=null, 
importance=0.81¶  
rel#515:LogicalProject.NONE.[0](input=RelSubset#514,item=$0,$f1=2), 
rowcount=3.0, cumulative cost={inf}¶  rel#521:LogicalValues.NONE.[[0, 1], 
[1]](type=RecordType(CHAR(1) item, INTEGER $f1),tuples=[{ 'a', 2 }, { 'b', 2 }, 
{ 'c', 2 }]), rowcount=3.0, cumulative cost={inf}¶ rel#522:Subset#1.NONE.[], 
best=null, importance=0.405¶  
rel#515:LogicalProject.NONE.[0](input=RelSubset#514,item=$0,$f1=2), 
rowcount=3.0, cumulative cost={inf}¶  rel#521:LogicalValues.NONE.[[0, 1], 
[1]](type=RecordType(CHAR(1) item, INTEGER $f1),tuples=[{ 'a', 2 }, { 'b', 2 }, 
{ 'c', 2 }]), rowcount=3.0, cumulative cost={inf}¶Set#2, type: RecordType(OTHER 
EXPR$0)¶ rel#518:Subset#2.NONE.[], best=null, importance=0.9¶  
rel#517:LogicalAggregate.NONE.[](input=RelSubset#516,group={},EXPR$0=BLOOM_FILTER($0,
 $1)), rowcount=1.0, cumulative cost={inf}¶ rel#519:Subset#2.DRUID.[], 
best=null, importance=1.0¶  
rel#520:AbstractConverter.DRUID.[](input=RelSubset#518,convention=DRUID,sort=[]),
 rowcount=1.0, cumulative cost={inf}¶¶Graphviz:¶digraph G {¶ root 
[style=filled,label="Root"];¶ subgraph cluster0{¶  label="Set 0 
RecordType(CHAR(1) item)";¶  rel506 
[label="rel#506:LogicalValues\ntype=RecordType(CHAR(1) item),tuples=[{ 'a' }, { 
'b' }, { 'c' }]\nrows=3.0, cost={inf}",shape=box]¶  subset514 
[label="rel#514:Subset#0.NONE.[0]"]¶ }¶ subgraph cluster1{¶  label="Set 1 
RecordType(CHAR(1) item, INTEGER $f1)";¶  rel515 
[label="rel#515:LogicalProject\ninput=RelSubset#514,item=$0,$f1=2\nrows=3.0, 
cost={inf}",shape=box]¶  rel521 [label="rel#521:LogicalValues.NONE.[[0, 1], 
[1]]\ntype=RecordType(CHAR(1) item, INTEGER $f1),tuples=[{ 'a', 2 }, { 'b', 2 
}, { 'c', 2 }]\nrows=3.0, cost={inf}",shape=box]¶  subset516 
[label="rel#516:Subset#1.NONE.[0]"]¶  subset522 
[label="rel#522:Subset#1.NONE.[]"]¶  subset522 -> subset516; }¶ subgraph 
cluster2{¶  label="Set 2 RecordType(OTHER EXPR$0)";¶  rel517 
[label="rel#517:LogicalAggregate\ninput=RelSubset#516,group={},EXPR$0=BLOOM_FILTER($0,
 $1)\nrows=1.0, cost={inf}",shape=box]¶  rel520 
[label="rel#520:AbstractConverter\ninput=RelSubset#518,convention=DRUID,sort=[]\nrows=1.0,
 cost={inf}",shape=box]¶  subset518 [label="rel#518:Subset#2.NONE.[]"]¶  
subset519 [label="rel#519:Subset#2.DRUID.[]",color=red]¶ }¶ root -> subset519;¶ 
subset514 -> rel506;¶ subset516 -> rel515; rel515 -> subset514;¶ subset522 -> 
rel521;¶ subset518 -> rel517; rel517 -> subset516;¶ subset519 -> rel520; rel520 
-> subset518;¶}`
   
   
   
   
   
   
   
   
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to