Hi Alessandro,

I am new to Calcite, and just started using Calcite recently, and wanted to 
understand how the logical to physical conversion works.

I wrote a SQL statement, but the Schema / Table class I specified in model 
json/yaml are not ElasticsearchSchema/ElasticsearchTable, you can assume they 
are MyOwnSchema/MyOwnTable. I firstly got the LogicalProject/Filter/TableScan 
nodes tree by parsing the SQL, then converted the nodes to 
EnumerableProject/Filter/TableScan nodes.

It's unclear to me whether I can convert the Enumerable RelNode I mentioned 
above to Elasticsearch RelNode or I have to use 
Elasticsearch/ElasticsearchTable in model json/yaml to parse the SQL statement? 
I assumed Calcite can convert any supported SQL to Elasticsearch DSL no matter 
what the Schema/Table specification of the SQL are.

Thanks,
Justin

From
Alessandro Solimando 
[email protected]<mailto:[email protected]>
Subject
Re: Need help: exception when using Elasticsearch adapter
Date
Thu, 30 Sep 2021 06:14:05 GMT
Each adapter "speaks" the language of the supported data source, but only
to the extent of querying it, and for the tables stored in it.

If your input tables are not from ES I have a hard time to even understand
what you are trying to achieve in this way.

Consider also that a plan consists generally of operations that are not
always supported in all data sources, that's why when querying data the
adapter "knows" if a filter/join/etc. can be pushed-down/expressed in the
data source query language, if not they will be executed in the Enumerable
convention.

Can you elaborate a bit more on what you are trying to achieve?


From: Justin Huang
Sent: Thursday, September 30, 2021 8:02 AM
To: '[email protected]' <[email protected]>
Subject: RE: Need help: exception when using Elasticsearch adapter

I can convert the Logical RelNode to Enumerable RelNode, and then I added 
another step to convert Enumerable RelNode to Elasticsearch RelNode:

def toEsPhysicalPlan(root):
    print(f"toEsPhysicalPlan root={root}")
    planner = root.getCluster().getPlanner()
    planner.clear()
    for rule in ElasticsearchRules.RULES:
        planner.addRule(rule)

    program = Programs.of(RuleSets.ofList(planner.getRules()))
    traits = root.getTraitSet().replace(ElasticsearchRel.CONVENTION)
return program.run(planner, root, traits, ImmutableList.of(), 
ImmutableList.of())

Now the program.run failed with below exception, what can be the reason? Is 
there any example that shows how to convert a SQL to Elasticsearch Json? BTW, 
my input schema/table is not of ElasticsearchSchema/Table type.

org.apache.calcite.plan.RelOptPlanner.CannotPlanException: 
org.apache.calcite.plan.RelOptPlanner$CannotPlanException: There are not enough 
rules to produce a node with desired properties: convention=ELASTICSEARCH, 
sort=[].
Missing conversion is EnumerableProject[convention: ENUMERABLE -> ELASTICSEARCH]
There is 1 empty subset: rel#67:RelSubset#3.ELASTICSEARCH.[], the relevant part 
of the original plan is as follows
65:EnumerableProject(BugId=[$0], ARB=[$1])
  60:EnumerableTableScan(subset=[rel#64:RelSubset#2.ENUMERABLE.[]], 
table=[[root, table_bugx]])

Root: rel#67:RelSubset#3.ELASTICSEARCH.[]
Original rel:
EnumerableProject(subset=[rel#58:RelSubset#1.ENUMERABLE.[]], BugId=[$0], 
ARB=[$1]): rowcount = 100.0, cumulative cost = {100.0 rows, 200.0 cpu, 0.0 io}, 
id = 62
  EnumerableTableScan(subset=[rel#61:RelSubset#0.ENUMERABLE.[]], table=[[root, 
table_bugx]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 
io}, id = 60

Sets:
Set#2, type: RecordType(BIGINT BugId, VARCHAR ARB)
        rel#64:RelSubset#2.ENUMERABLE.[], best=rel#60
                rel#60:EnumerableTableScan.ENUMERABLE.[](table=[root, 
table_bugx]), rowcount=100.0, cumulative cost={100.0 rows, 101.0 cpu, 0.0 io}
Set#3, type: RecordType(BIGINT BugId, VARCHAR ARB)
        rel#66:RelSubset#3.ENUMERABLE.[], best=rel#65
                
rel#65:EnumerableProject.ENUMERABLE.[](input=RelSubset#64,inputs=0..1), 
rowcount=100.0, cumulative cost={200.0 rows, 301.0 cpu, 0.0 io}
        rel#67:RelSubset#3.ELASTICSEARCH.[], best=null
                
rel#68:AbstractConverter.ELASTICSEARCH.[](input=RelSubset#66,convention=ELASTICSEARCH,sort=[]),
 rowcount=100.0, cumulative cost={inf}

Graphviz:
digraph G {
        root [style=filled,label="Root"];
        subgraph cluster2{
                label="Set 2 RecordType(BIGINT BugId, VARCHAR ARB)";
                rel60 [label="rel#60:EnumerableTableScan\ntable=[root, 
table_bugx]\nrows=100.0, cost={100.0 rows, 101.0 cpu, 0.0 
io}",color=blue,shape=box]
                subset64 [label="rel#64:RelSubset#2.ENUMERABLE.[]"]
        }
        subgraph cluster3{
                label="Set 3 RecordType(BIGINT BugId, VARCHAR ARB)";
                rel65 
[label="rel#65:EnumerableProject\ninput=RelSubset#64,inputs=0..1\nrows=100.0, 
cost={200.0 rows, 301.0 cpu, 0.0 io}",color=blue,shape=box]
                rel68 
[label="rel#68:AbstractConverter\ninput=RelSubset#66,convention=ELASTICSEARCH,sort=[]\nrows=100.0,
 cost={inf}",shape=box]
                subset66 [label="rel#66:RelSubset#3.ENUMERABLE.[]"]
                subset67 [label="rel#67:RelSubset#3.ELASTICSEARCH.[]",color=red]
        }
        root -> subset67;
        subset64 -> rel60[color=blue];
        subset66 -> rel65[color=blue]; rel65 -> subset64[color=blue];
        subset67 -> rel68; rel68 -> subset66;
}

Thanks,
Jusitn

From: Justin Huang
Sent: Monday, September 27, 2021 1:56 PM
To: [email protected]<mailto:[email protected]>
Subject: Need help: exception when using Elasticsearch adapter

Hi Calcite developers,

I tried to convert a simple 'SELECT * FROM xxx' statement to Elasticsearch DSL 
using the Elasticsearch adapter.

The toPhysicalPlan(relNode) failed when running program.run. Can anyone shed 
some lights on this? Any requirements on the Table/Schema? The table I used in 
my class is a DaskTable 
<https://github.com/dask-contrib/dask-sql/blob/main/planner/src/main/java/com/dask/sql/schema/DaskTable.java>
  (from dask-sql project) which implements the interface 
ProjectableFilterableTable.

Thanks,
Justin

I am using Calcite in Python using with Jpype module, and the function sql2dsl 
converts RelNode to Elasticsearch DSL:

def toPhysicalPlan(root):
    rules = (
                EnumerableRules.ENUMERABLE_PROJECT_RULE,
                EnumerableRules.ENUMERABLE_FILTER_RULE,
                EnumerableRules.ENUMERABLE_AGGREGATE_RULE,
                EnumerableRules.ENUMERABLE_SORT_RULE,
                EnumerableRules.ENUMERABLE_TABLE_SCAN_RULE
            )

    planner = root.getCluster().getPlanner()
    planner.clear()
    for rule in rules:
        planner.addRule(rule)

    program = Programs.of(RuleSets.ofList(planner.getRules()))
    traits = root.getTraitSet().replace(EnumerableConvention.INSTANCE)
    return program.run(planner, root, traits, ImmutableList.of(), 
ImmutableList.of())

def toElasticsearchQuery(root):
    relImplementor = 
EnumerableRelImplementor(root.getCluster().getRexBuilder(), ImmutableMap.of())
    elasticsearchImplementor = ElasticsearchRel.Implementor()
    elasticsearchImplementor.visitChild(0, root)
    for x in elasticsearchImplementor.list:
        print(f"x: {x}")

def sql2dsl(relNode):
    esPhysicalPlan = toPhysicalPlan(relNode)
    esJson = toElasticsearchQuery(esPhysicalPlan)
    print(f"esJson={str(esJson)}")

Here are the error messages:

org.apache.calcite.plan.RelOptPlanner.CannotPlanException: 
org.apache.calcite.plan.RelOptPlanner$CannotPlanException: There are not enough 
rules to produce a node with desired properties: convention=ENUMERABLE, sort=[].
Missing conversion is LogicalTableScan[convention: NONE -> ENUMERABLE]
There is 1 empty subset: rel#51:RelSubset#0.ENUMERABLE.[], the relevant part of 
the original plan is as follows
47:LogicalTableScan(table=[[xxx]])

Root: rel#51:RelSubset#0.ENUMERABLE.[]
Original rel:
LogicalTableScan(subset=[rel#51:RelSubset#0.ENUMERABLE.[]], table=[[xxx]]): 
rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 47

Sets:
Set#0, type: RecordType(VARCHAR name, BIGINT age, VARCHAR position, BIGINT 
BugId)
        rel#50:RelSubset#0.NONE.[], best=null
                rel#47:LogicalTableScan.NONE.[](table=[xxx]), rowcount=100.0, 
cumulative cost={inf}
        rel#51:RelSubset#0.ENUMERABLE.[], best=null
                
rel#52:AbstractConverter.ENUMERABLE.[](input=RelSubset#50,convention=ENUMERABLE,sort=[]),
 rowcount=100.0, cumulative cost={inf}

Graphviz:
digraph G {
        root [style=filled,label="Root"];
        subgraph cluster0{
                label="Set 0 RecordType(VARCHAR name, BIGINT age, VARCHAR 
position, BIGINT BugId)";
                rel47 [label="rel#47:LogicalTableScan\ntable=[xxx]\nrows=100.0, 
cost={inf}",shape=box]
                rel52 
[label="rel#52:AbstractConverter\ninput=RelSubset#50,convention=ENUMERABLE,sort=[]\nrows=100.0,
 cost={inf}",shape=box]
                subset50 [label="rel#50:RelSubset#0.NONE.[]"]
                subset51 [label="rel#51:RelSubset#0.ENUMERABLE.[]",color=red]
        }
        root -> subset51;
        subset50 -> rel47;
        subset51 -> rel52; rel52 -> subset50;
}

Reply via email to