1. I happen to notice that the value in limit could change the query plan.
For example, if I do explain plan for "select \"tweet_id\" from
\"userline\" limit 1", the plan is:
PLAN=EnumerableCalc(expr#0..2=[{inputs}], tweet_id=[$t2])
EnumerableLimit(fetch=[1])
CassandraToEnumerableConverter
CassandraTableScan(table=[[twissandra, userline]])
But if limit value no smaller than 8, such as for "select \"tweet_id\" from
\"userline\" limit 8", the plan is:
PLAN=CassandraToEnumerableConverter
CassandraLimit(fetch=[8])
CassandraProject(tweet_id=[$2])
CassandraTableScan(table=[[twissandra, userline]])
Since computeSelfCost() depends on the row count, I guess that is why the
query plans are different. But I want to make sure the query plan stay the
same as the second one even the limit value is smaller than 8. How should I
tweak the computeSelfCost()?
2. If I do a inner join query that spans across cassandra and another
database, such as mongodb, an example for illustration purpose:
"with\n" +
"tweet as (\n" +
" select \"tweet_id\" as id from
\"twissandra\".\"userline\" where "\"username\"='!PUBLIC!' order by
\"time\")\n," +
"metrics as (\n" +
" select \"id\" as id, \"metricName\", \"value\"\n" +
" from \"mongodb\".METRIC\n" +
" where \"metricName\" = 'cpu.usage.average'\n" +
")\n" +
"select tweet.id, metrics.\"value\" from tweet\n" +
"inner join metrics using(id)\n" +
"where metrics.\"value\" > 70"
According to CassandraProjectRule, super(LogicalProject.class,
"CassandraProjectRule"), seems like as long as there is a
LogicalProject.class, this CassandraProjectRule will get triggered. Does
this mean LogicalProject that belongs to mongodb will also get matched with
CassandraProjectRule? But when I do explain plan, the generated plan does
separate the projections correctly to Cassandra and mongodb. How can
Calcite tell which projection belong to which schema? Thanks.
Sincerely,
Junwei Li