Hi folks,

I did some discussions already with others but wanted to bring the topic now to 
the list.
Currently, we have no “real” Query Optimizer in IoTDB.
We do some heuristics to optimize queries but not in the way a regular RDBMS 
does things.

Some of you may have heard of the Apache Calcite Framework which is basically a 
complete SQL Frontend which does Query parsing, generation of a Relational Node 
Tree and Query Optimization.
As we are not using regular SQL, we cannot rely on the Query parser and other 
features of the Framework.
But we could use the Optimizer, if we are able to represent our Queries in a 
“relational” way.

I did several experiments on how to integrate Calcite in IoTDB in a way that it 
gets a QueryPlan from IoTDB and later returns a QuerySet which makes it totally 
transparent for the IoTDB server and simply replaces the Logic in the current 
QueryRouter.

To give you a concrete example, take a simple (IoTDB) query like:

select vehicle.d0.s1 from root where root.vehicle.d0.s0 > 100

To represent this as a Relational Expression, we must first consider how the 
tables look. In Database Language we have two input tables (the timeseries 
root.vehicle.d0.s0 and root.vehicle.d0.s0) which can both be considered as 
relational tables of type <LONG (for the timestamp), X (for their respective 
type)>.

The Logical Query Plan, if we would do aboves operation on Tables then looks 
like that:

LogicalProject(time=[$0], s1=[$1])
  LogicalFilter(condition=[>($2, 100:JavaType(long))])
    LogicalSort(sort0=[$0], dir0=[ASC])
      LogicalProject(time=[COALESCE($0, $2)], s1=[$1], s0=[$3])
        LogicalJoin(condition=[=($0, $2)], joinType=[left])
          LogicalTableScan(table=[[root, root.test.d0.s1]])
          LogicalTableScan(table=[[root, root.vehicle.d0.s0]])

Of course this could be highly optimized with specific rules (as both tables 
are sorted the join is way easier than a regular join, etc).
If we throw that now into Calcites Query Optimizer (without any IoTDB specific 
rules), we get:

BindableProject(time=[$0], s1=[$1])
  BindableFilter(condition=[>($2, 100:JavaType(long))])
    BindableSort(sort0=[$0], dir0=[ASC])
      BindableProject(time=[COALESCE($0, $2)], s1=[$1], s0=[$3])
        BindableJoin(condition=[=($0, $2)], joinType=[left])
          BindableTableScan(table=[[root, root.test.d0.s1]])
          BindableTableScan(table=[[root, root.vehicle.d0.s0]])

Which then generates the code we can execute.

So what did we win?
Nothing, so far.

BUT we could now add specific rules e.g. to push the filter down, or to do a 
more effective Join to get en-par with our current performance.
And in the next step, we could introduce INDEXES which in this scenario could 
be simply “materialized views” with pre-cached results that were again stored 
as a table.

So I know that this is a bit theoretic but if people are interested in this 
approach I can go into more details or even suggest to do a design doc or a 
video discussion about the pros and cons of this approach.

I do have a branch with a draft implementation where this feature can simply be 
switched on and off, so we can easily experiment with it without the danger of 
degrading our performance.

Best!
Julian

Reply via email to