sascha-coenen opened a new issue #9413: SQL: BLOOM_FILTER() does not work on constant values fed in via sub query URL: https://github.com/apache/druid/issues/9413 ### Affected Version 0.16.0 and 0.17.0 ### Description Sometimes users need to feed large lists of tokens to an IN filter which slows down queries, so I was trying to rewrite such queries so that a bloom filter would capture the filter conditions. I found out that the bloom_filter() function only seems to work if applied directly to a table column. However, if constant values from a subquery get passed in, the SQL query cannot be transformed into a native JSON query due to rule exceptions. The following set of queries demonstrate at which point things break: `VALUES (1,2,3), (4,5,6), (7,8,9)` works `VALUES 'a', 'b', 'c'` works `SELECT blacklist.item FROM (VALUES 'a', 'b', 'c') AS blacklist (item)` works `SELECT COUNT(blacklist.item) FROM (VALUES 'a', 'b', 'c') AS blacklist (item)` works `SELECT BLOOM_FILTER(blacklist.item, 100) FROM (VALUES 'a', 'b', 'c') AS blacklist (item)` fails with the exception below. This is stange because the above query succeeds although the only difference is that another aggregation function got used, namely COUNT instead of BLOOM_FILTER but the structure of both queries is identical. The following works fine: `SELECT BLOOM_FILTER(CountryCode, 100) FROM revenue_statistics WHERE __time > TIMESTAMP '2019-01-01 00:00:00'` works exception: `SQL Error [00000]: Error -1 (00000) : Error while executing SQL "SELECT BLOOM_FILTER(blacklist.item, 2) FROM (VALUES 'a', 'b', 'c') AS blacklist (item)": Remote driver error: RuntimeException: org.apache.calcite.plan.RelOptPlanner$CannotPlanException: There are not enough rules to produce a node with desired properties: convention=DRUID, sort=[].¶Missing conversion is LogicalAggregate[convention: NONE -> DRUID]¶There is 1 empty subset: rel#519:Subset#2.DRUID.[], the relevant part of the original plan is as follows¶517:LogicalAggregate(group=[{}], EXPR$0=[BLOOM_FILTER($0, $1)])¶ 515:LogicalProject(subset=[rel#516:Subset#1.NONE.[0]], item=[$0], $f1=[2])¶ 506:LogicalValues(subset=[rel#514:Subset#0.NONE.[0]], tuples=[[{ 'a' }, { 'b' }, { 'c' }]])¶¶Root: rel#519:Subset#2.DRUID.[]¶Original rel:¶LogicalAggregate(subset=[rel#519:Subset#2.DRUID.[]], group=[{}], EXPR$0=[BLOOM_FILTER($0, $1)]): rowcount = 1.0, cumulative cost = {1.125 rows, 0.0 cpu, 0.0 io}, id = 517¶ LogicalProject(subset=[rel#516:Subset#1.NONE.[0]], item=[$0], $f1=[2]): rowcount = 3.0, cumulative cost = {3.0 rows, 6.0 cpu, 0.0 io}, id = 515¶ LogicalValues(subset=[rel#514:Subset#0.NONE.[0]], tuples=[[{ 'a' }, { 'b' }, { 'c' }]]): rowcount = 3.0, cumulative cost = {3.0 rows, 1.0 cpu, 0.0 io}, id = 506¶¶Sets:¶Set#0, type: RecordType(CHAR(1) item)¶ rel#514:Subset#0.NONE.[0], best=null, importance=0.7290000000000001¶ rel#506:LogicalValues.NONE.[0](type=RecordType(CHAR(1) item),tuples=[{ 'a' }, { 'b' }, { 'c' }]), rowcount=3.0, cumulative cost={inf}¶Set#1, type: RecordType(CHAR(1) item, INTEGER $f1)¶ rel#516:Subset#1.NONE.[0], best=null, importance=0.81¶ rel#515:LogicalProject.NONE.[0](input=RelSubset#514,item=$0,$f1=2), rowcount=3.0, cumulative cost={inf}¶ rel#521:LogicalValues.NONE.[[0, 1], [1]](type=RecordType(CHAR(1) item, INTEGER $f1),tuples=[{ 'a', 2 }, { 'b', 2 }, { 'c', 2 }]), rowcount=3.0, cumulative cost={inf}¶ rel#522:Subset#1.NONE.[], best=null, importance=0.405¶ rel#515:LogicalProject.NONE.[0](input=RelSubset#514,item=$0,$f1=2), rowcount=3.0, cumulative cost={inf}¶ rel#521:LogicalValues.NONE.[[0, 1], [1]](type=RecordType(CHAR(1) item, INTEGER $f1),tuples=[{ 'a', 2 }, { 'b', 2 }, { 'c', 2 }]), rowcount=3.0, cumulative cost={inf}¶Set#2, type: RecordType(OTHER EXPR$0)¶ rel#518:Subset#2.NONE.[], best=null, importance=0.9¶ rel#517:LogicalAggregate.NONE.[](input=RelSubset#516,group={},EXPR$0=BLOOM_FILTER($0, $1)), rowcount=1.0, cumulative cost={inf}¶ rel#519:Subset#2.DRUID.[], best=null, importance=1.0¶ rel#520:AbstractConverter.DRUID.[](input=RelSubset#518,convention=DRUID,sort=[]), rowcount=1.0, cumulative cost={inf}¶¶Graphviz:¶digraph G {¶ root [style=filled,label="Root"];¶ subgraph cluster0{¶ label="Set 0 RecordType(CHAR(1) item)";¶ rel506 [label="rel#506:LogicalValues\ntype=RecordType(CHAR(1) item),tuples=[{ 'a' }, { 'b' }, { 'c' }]\nrows=3.0, cost={inf}",shape=box]¶ subset514 [label="rel#514:Subset#0.NONE.[0]"]¶ }¶ subgraph cluster1{¶ label="Set 1 RecordType(CHAR(1) item, INTEGER $f1)";¶ rel515 [label="rel#515:LogicalProject\ninput=RelSubset#514,item=$0,$f1=2\nrows=3.0, cost={inf}",shape=box]¶ rel521 [label="rel#521:LogicalValues.NONE.[[0, 1], [1]]\ntype=RecordType(CHAR(1) item, INTEGER $f1),tuples=[{ 'a', 2 }, { 'b', 2 }, { 'c', 2 }]\nrows=3.0, cost={inf}",shape=box]¶ subset516 [label="rel#516:Subset#1.NONE.[0]"]¶ subset522 [label="rel#522:Subset#1.NONE.[]"]¶ subset522 -> subset516; }¶ subgraph cluster2{¶ label="Set 2 RecordType(OTHER EXPR$0)";¶ rel517 [label="rel#517:LogicalAggregate\ninput=RelSubset#516,group={},EXPR$0=BLOOM_FILTER($0, $1)\nrows=1.0, cost={inf}",shape=box]¶ rel520 [label="rel#520:AbstractConverter\ninput=RelSubset#518,convention=DRUID,sort=[]\nrows=1.0, cost={inf}",shape=box]¶ subset518 [label="rel#518:Subset#2.NONE.[]"]¶ subset519 [label="rel#519:Subset#2.DRUID.[]",color=red]¶ }¶ root -> subset519;¶ subset514 -> rel506;¶ subset516 -> rel515; rel515 -> subset514;¶ subset522 -> rel521;¶ subset518 -> rel517; rel517 -> subset516;¶ subset519 -> rel520; rel520 -> subset518;¶} -> CannotPlanException: There are not enough rules to produce a node with desired properties: convention=DRUID, sort=[].¶Missing conversion is LogicalAggregate[convention: NONE -> DRUID]¶There is 1 empty subset: rel#519:Subset#2.DRUID.[], the relevant part of the original plan is as follows¶517:LogicalAggregate(group=[{}], EXPR$0=[BLOOM_FILTER($0, $1)])¶ 515:LogicalProject(subset=[rel#516:Subset#1.NONE.[0]], item=[$0], $f1=[2])¶ 506:LogicalValues(subset=[rel#514:Subset#0.NONE.[0]], tuples=[[{ 'a' }, { 'b' }, { 'c' }]])¶¶Root: rel#519:Subset#2.DRUID.[]¶Original rel:¶LogicalAggregate(subset=[rel#519:Subset#2.DRUID.[]], group=[{}], EXPR$0=[BLOOM_FILTER($0, $1)]): rowcount = 1.0, cumulative cost = {1.125 rows, 0.0 cpu, 0.0 io}, id = 517¶ LogicalProject(subset=[rel#516:Subset#1.NONE.[0]], item=[$0], $f1=[2]): rowcount = 3.0, cumulative cost = {3.0 rows, 6.0 cpu, 0.0 io}, id = 515¶ LogicalValues(subset=[rel#514:Subset#0.NONE.[0]], tuples=[[{ 'a' }, { 'b' }, { 'c' }]]): rowcount = 3.0, cumulative cost = {3.0 rows, 1.0 cpu, 0.0 io}, id = 506¶¶Sets:¶Set#0, type: RecordType(CHAR(1) item)¶ rel#514:Subset#0.NONE.[0], best=null, importance=0.7290000000000001¶ rel#506:LogicalValues.NONE.[0](type=RecordType(CHAR(1) item),tuples=[{ 'a' }, { 'b' }, { 'c' }]), rowcount=3.0, cumulative cost={inf}¶Set#1, type: RecordType(CHAR(1) item, INTEGER $f1)¶ rel#516:Subset#1.NONE.[0], best=null, importance=0.81¶ rel#515:LogicalProject.NONE.[0](input=RelSubset#514,item=$0,$f1=2), rowcount=3.0, cumulative cost={inf}¶ rel#521:LogicalValues.NONE.[[0, 1], [1]](type=RecordType(CHAR(1) item, INTEGER $f1),tuples=[{ 'a', 2 }, { 'b', 2 }, { 'c', 2 }]), rowcount=3.0, cumulative cost={inf}¶ rel#522:Subset#1.NONE.[], best=null, importance=0.405¶ rel#515:LogicalProject.NONE.[0](input=RelSubset#514,item=$0,$f1=2), rowcount=3.0, cumulative cost={inf}¶ rel#521:LogicalValues.NONE.[[0, 1], [1]](type=RecordType(CHAR(1) item, INTEGER $f1),tuples=[{ 'a', 2 }, { 'b', 2 }, { 'c', 2 }]), rowcount=3.0, cumulative cost={inf}¶Set#2, type: RecordType(OTHER EXPR$0)¶ rel#518:Subset#2.NONE.[], best=null, importance=0.9¶ rel#517:LogicalAggregate.NONE.[](input=RelSubset#516,group={},EXPR$0=BLOOM_FILTER($0, $1)), rowcount=1.0, cumulative cost={inf}¶ rel#519:Subset#2.DRUID.[], best=null, importance=1.0¶ rel#520:AbstractConverter.DRUID.[](input=RelSubset#518,convention=DRUID,sort=[]), rowcount=1.0, cumulative cost={inf}¶¶Graphviz:¶digraph G {¶ root [style=filled,label="Root"];¶ subgraph cluster0{¶ label="Set 0 RecordType(CHAR(1) item)";¶ rel506 [label="rel#506:LogicalValues\ntype=RecordType(CHAR(1) item),tuples=[{ 'a' }, { 'b' }, { 'c' }]\nrows=3.0, cost={inf}",shape=box]¶ subset514 [label="rel#514:Subset#0.NONE.[0]"]¶ }¶ subgraph cluster1{¶ label="Set 1 RecordType(CHAR(1) item, INTEGER $f1)";¶ rel515 [label="rel#515:LogicalProject\ninput=RelSubset#514,item=$0,$f1=2\nrows=3.0, cost={inf}",shape=box]¶ rel521 [label="rel#521:LogicalValues.NONE.[[0, 1], [1]]\ntype=RecordType(CHAR(1) item, INTEGER $f1),tuples=[{ 'a', 2 }, { 'b', 2 }, { 'c', 2 }]\nrows=3.0, cost={inf}",shape=box]¶ subset516 [label="rel#516:Subset#1.NONE.[0]"]¶ subset522 [label="rel#522:Subset#1.NONE.[]"]¶ subset522 -> subset516; }¶ subgraph cluster2{¶ label="Set 2 RecordType(OTHER EXPR$0)";¶ rel517 [label="rel#517:LogicalAggregate\ninput=RelSubset#516,group={},EXPR$0=BLOOM_FILTER($0, $1)\nrows=1.0, cost={inf}",shape=box]¶ rel520 [label="rel#520:AbstractConverter\ninput=RelSubset#518,convention=DRUID,sort=[]\nrows=1.0, cost={inf}",shape=box]¶ subset518 [label="rel#518:Subset#2.NONE.[]"]¶ subset519 [label="rel#519:Subset#2.DRUID.[]",color=red]¶ }¶ root -> subset519;¶ subset514 -> rel506;¶ subset516 -> rel515; rel515 -> subset514;¶ subset522 -> rel521;¶ subset518 -> rel517; rel517 -> subset516;¶ subset519 -> rel520; rel520 -> subset518;¶}`
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
