Hi Alessandro, I am new to Calcite, and just started using Calcite recently, and wanted to understand how the logical to physical conversion works.
I wrote a SQL statement, but the Schema / Table class I specified in model json/yaml are not ElasticsearchSchema/ElasticsearchTable, you can assume they are MyOwnSchema/MyOwnTable. I firstly got the LogicalProject/Filter/TableScan nodes tree by parsing the SQL, then converted the nodes to EnumerableProject/Filter/TableScan nodes. It's unclear to me whether I can convert the Enumerable RelNode I mentioned above to Elasticsearch RelNode or I have to use Elasticsearch/ElasticsearchTable in model json/yaml to parse the SQL statement? I assumed Calcite can convert any supported SQL to Elasticsearch DSL no matter what the Schema/Table specification of the SQL are. Thanks, Justin From Alessandro Solimando [email protected]<mailto:[email protected]> Subject Re: Need help: exception when using Elasticsearch adapter Date Thu, 30 Sep 2021 06:14:05 GMT Each adapter "speaks" the language of the supported data source, but only to the extent of querying it, and for the tables stored in it. If your input tables are not from ES I have a hard time to even understand what you are trying to achieve in this way. Consider also that a plan consists generally of operations that are not always supported in all data sources, that's why when querying data the adapter "knows" if a filter/join/etc. can be pushed-down/expressed in the data source query language, if not they will be executed in the Enumerable convention. Can you elaborate a bit more on what you are trying to achieve? From: Justin Huang Sent: Thursday, September 30, 2021 8:02 AM To: '[email protected]' <[email protected]> Subject: RE: Need help: exception when using Elasticsearch adapter I can convert the Logical RelNode to Enumerable RelNode, and then I added another step to convert Enumerable RelNode to Elasticsearch RelNode: def toEsPhysicalPlan(root): print(f"toEsPhysicalPlan root={root}") planner = root.getCluster().getPlanner() planner.clear() for rule in ElasticsearchRules.RULES: planner.addRule(rule) program = Programs.of(RuleSets.ofList(planner.getRules())) traits = root.getTraitSet().replace(ElasticsearchRel.CONVENTION) return program.run(planner, root, traits, ImmutableList.of(), ImmutableList.of()) Now the program.run failed with below exception, what can be the reason? Is there any example that shows how to convert a SQL to Elasticsearch Json? BTW, my input schema/table is not of ElasticsearchSchema/Table type. org.apache.calcite.plan.RelOptPlanner.CannotPlanException: org.apache.calcite.plan.RelOptPlanner$CannotPlanException: There are not enough rules to produce a node with desired properties: convention=ELASTICSEARCH, sort=[]. Missing conversion is EnumerableProject[convention: ENUMERABLE -> ELASTICSEARCH] There is 1 empty subset: rel#67:RelSubset#3.ELASTICSEARCH.[], the relevant part of the original plan is as follows 65:EnumerableProject(BugId=[$0], ARB=[$1]) 60:EnumerableTableScan(subset=[rel#64:RelSubset#2.ENUMERABLE.[]], table=[[root, table_bugx]]) Root: rel#67:RelSubset#3.ELASTICSEARCH.[] Original rel: EnumerableProject(subset=[rel#58:RelSubset#1.ENUMERABLE.[]], BugId=[$0], ARB=[$1]): rowcount = 100.0, cumulative cost = {100.0 rows, 200.0 cpu, 0.0 io}, id = 62 EnumerableTableScan(subset=[rel#61:RelSubset#0.ENUMERABLE.[]], table=[[root, table_bugx]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 60 Sets: Set#2, type: RecordType(BIGINT BugId, VARCHAR ARB) rel#64:RelSubset#2.ENUMERABLE.[], best=rel#60 rel#60:EnumerableTableScan.ENUMERABLE.[](table=[root, table_bugx]), rowcount=100.0, cumulative cost={100.0 rows, 101.0 cpu, 0.0 io} Set#3, type: RecordType(BIGINT BugId, VARCHAR ARB) rel#66:RelSubset#3.ENUMERABLE.[], best=rel#65 rel#65:EnumerableProject.ENUMERABLE.[](input=RelSubset#64,inputs=0..1), rowcount=100.0, cumulative cost={200.0 rows, 301.0 cpu, 0.0 io} rel#67:RelSubset#3.ELASTICSEARCH.[], best=null rel#68:AbstractConverter.ELASTICSEARCH.[](input=RelSubset#66,convention=ELASTICSEARCH,sort=[]), rowcount=100.0, cumulative cost={inf} Graphviz: digraph G { root [style=filled,label="Root"]; subgraph cluster2{ label="Set 2 RecordType(BIGINT BugId, VARCHAR ARB)"; rel60 [label="rel#60:EnumerableTableScan\ntable=[root, table_bugx]\nrows=100.0, cost={100.0 rows, 101.0 cpu, 0.0 io}",color=blue,shape=box] subset64 [label="rel#64:RelSubset#2.ENUMERABLE.[]"] } subgraph cluster3{ label="Set 3 RecordType(BIGINT BugId, VARCHAR ARB)"; rel65 [label="rel#65:EnumerableProject\ninput=RelSubset#64,inputs=0..1\nrows=100.0, cost={200.0 rows, 301.0 cpu, 0.0 io}",color=blue,shape=box] rel68 [label="rel#68:AbstractConverter\ninput=RelSubset#66,convention=ELASTICSEARCH,sort=[]\nrows=100.0, cost={inf}",shape=box] subset66 [label="rel#66:RelSubset#3.ENUMERABLE.[]"] subset67 [label="rel#67:RelSubset#3.ELASTICSEARCH.[]",color=red] } root -> subset67; subset64 -> rel60[color=blue]; subset66 -> rel65[color=blue]; rel65 -> subset64[color=blue]; subset67 -> rel68; rel68 -> subset66; } Thanks, Jusitn From: Justin Huang Sent: Monday, September 27, 2021 1:56 PM To: [email protected]<mailto:[email protected]> Subject: Need help: exception when using Elasticsearch adapter Hi Calcite developers, I tried to convert a simple 'SELECT * FROM xxx' statement to Elasticsearch DSL using the Elasticsearch adapter. The toPhysicalPlan(relNode) failed when running program.run. Can anyone shed some lights on this? Any requirements on the Table/Schema? The table I used in my class is a DaskTable <https://github.com/dask-contrib/dask-sql/blob/main/planner/src/main/java/com/dask/sql/schema/DaskTable.java> (from dask-sql project) which implements the interface ProjectableFilterableTable. Thanks, Justin I am using Calcite in Python using with Jpype module, and the function sql2dsl converts RelNode to Elasticsearch DSL: def toPhysicalPlan(root): rules = ( EnumerableRules.ENUMERABLE_PROJECT_RULE, EnumerableRules.ENUMERABLE_FILTER_RULE, EnumerableRules.ENUMERABLE_AGGREGATE_RULE, EnumerableRules.ENUMERABLE_SORT_RULE, EnumerableRules.ENUMERABLE_TABLE_SCAN_RULE ) planner = root.getCluster().getPlanner() planner.clear() for rule in rules: planner.addRule(rule) program = Programs.of(RuleSets.ofList(planner.getRules())) traits = root.getTraitSet().replace(EnumerableConvention.INSTANCE) return program.run(planner, root, traits, ImmutableList.of(), ImmutableList.of()) def toElasticsearchQuery(root): relImplementor = EnumerableRelImplementor(root.getCluster().getRexBuilder(), ImmutableMap.of()) elasticsearchImplementor = ElasticsearchRel.Implementor() elasticsearchImplementor.visitChild(0, root) for x in elasticsearchImplementor.list: print(f"x: {x}") def sql2dsl(relNode): esPhysicalPlan = toPhysicalPlan(relNode) esJson = toElasticsearchQuery(esPhysicalPlan) print(f"esJson={str(esJson)}") Here are the error messages: org.apache.calcite.plan.RelOptPlanner.CannotPlanException: org.apache.calcite.plan.RelOptPlanner$CannotPlanException: There are not enough rules to produce a node with desired properties: convention=ENUMERABLE, sort=[]. Missing conversion is LogicalTableScan[convention: NONE -> ENUMERABLE] There is 1 empty subset: rel#51:RelSubset#0.ENUMERABLE.[], the relevant part of the original plan is as follows 47:LogicalTableScan(table=[[xxx]]) Root: rel#51:RelSubset#0.ENUMERABLE.[] Original rel: LogicalTableScan(subset=[rel#51:RelSubset#0.ENUMERABLE.[]], table=[[xxx]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 47 Sets: Set#0, type: RecordType(VARCHAR name, BIGINT age, VARCHAR position, BIGINT BugId) rel#50:RelSubset#0.NONE.[], best=null rel#47:LogicalTableScan.NONE.[](table=[xxx]), rowcount=100.0, cumulative cost={inf} rel#51:RelSubset#0.ENUMERABLE.[], best=null rel#52:AbstractConverter.ENUMERABLE.[](input=RelSubset#50,convention=ENUMERABLE,sort=[]), rowcount=100.0, cumulative cost={inf} Graphviz: digraph G { root [style=filled,label="Root"]; subgraph cluster0{ label="Set 0 RecordType(VARCHAR name, BIGINT age, VARCHAR position, BIGINT BugId)"; rel47 [label="rel#47:LogicalTableScan\ntable=[xxx]\nrows=100.0, cost={inf}",shape=box] rel52 [label="rel#52:AbstractConverter\ninput=RelSubset#50,convention=ENUMERABLE,sort=[]\nrows=100.0, cost={inf}",shape=box] subset50 [label="rel#50:RelSubset#0.NONE.[]"] subset51 [label="rel#51:RelSubset#0.ENUMERABLE.[]",color=red] } root -> subset51; subset50 -> rel47; subset51 -> rel52; rel52 -> subset50; }
