You are right, we have only row count into the source. There is no meta can
describe the query performance. Can you multiply the row count with a
explicit factor?

Yang Liu <[email protected]>于2020年2月20日 周四下午5:05写道:

> Currently I can use following code to getCost,
>
> def getCost(rootSchema: SchemaPlus, sql: String): RelOptCost = {
>     val configBuilder = Frameworks.newConfigBuilder()
>     configBuilder.defaultSchema(rootSchema)
>     val frameworkConfig = configBuilder.build()
>     val planner = Frameworks.getPlanner(frameworkConfig)
>
>     val sqlNode = planner.parse(sql)
>     val validate = planner.validate(sqlNode)
>     val rel = planner.rel(validate).project()
>     val mq = rel.getCluster().getMetadataQuery
>     mq.getNonCumulativeCost(rel)
> }
>
> but seems not very correct, for example,
>
> between two mysql
> PLAN: EnumerableCalc(expr#0..3=[{inputs}], proj#0..2=[{exprs}])
>   EnumerableHashJoin(condition=[=($0, $3)], joinType=[inner])
>     JdbcToEnumerableConverter
>       JdbcTableScan(table=[[perm, user_table]])
>     JdbcToEnumerableConverter
>       JdbcProject(id=[$0])
>         JdbcTableScan(table=[[insight_user, user_tab]])
> {1500.0 rows, 4500.0 cpu, 0.0 io}
>
>
> between es and mysql
> PLAN: EnumerableLimit(fetch=[10])
>   EnumerableHashJoin(condition=[=($2, $8)], joinType=[inner])
>     JdbcToEnumerableConverter
>       JdbcTableScan(table=[[insight_user, user_tab]])
>     ElasticsearchToEnumerableConverter
>       ElasticsearchProject(status=[ITEM($0, 'status')], user=[ITEM($0,
> 'user')])
>         ElasticsearchTableScan(table=[[es, es_table_1]])
> {10.0 rows, 828.9306334778566 cpu, 0.0 io}
>
> in practice the later one is much slower than the first one since the es
> table is very large.
>
> Seems the planner cost is a logical estimate, may I know the correct usage
> of it?
>
> Thanks
>
> On 2020/02/20 03:00:47, Yang Liu <[email protected]> wrote:
> > Hi all,
> >
> > Dive into docs and I find the getJoinRowCount
> > <
> https://calcite.apache.org/apidocs/org/apache/calcite/rel/metadata/RelMdUtil.html#getJoinRowCount(org.apache.calcite.rel.metadata.RelMetadataQuery,org.apache.calcite.rel.core.Join,org.apache.calcite.rex.RexNode)
> >
> > seems the very thing I am looking for, I can reject the join when the
> rows
> > are too many. Any ideas?
> >
> > Yang Liu <[email protected]> 于2020年2月17日周一 下午5:13写道:
> >
> > > Thanks Muhammad, can help give more detailed description?
> > > Currently I have searched a RefOptCost, is this the one you are
> referring
> > > to?
> > > And I need to clarify, I do not mind the cost happened in datasources
> > > since that will not affect my application, may just take a longer time
> to
> > > get the result if the cost is high.
> > > I care about the cost in our application and afraid of OOM.
> > >
> > > Thanks
> > >
> > > Muhammad Gelbana <[email protected]> 于2020年2月17日周一 上午4:26写道:
> > >
> > >> If your only concern is about memory utilization, I would try
> estimating
> > >> this using the plan's cost. But I guess you'll have run some tests to
> > >> estimate the ranges you can accept.
> > >>
> > >>
> > >> On Sun, Feb 16, 2020 at 5:50 PM Yang Liu <[email protected]> wrote:
> > >>
> > >> > Is it possible to have some limitations on the SQLs to make sure our
> > >> > application which depends on Calcite is "safe"? For example, when do
> > >> merge
> > >> > joining between 2 large datasources, our application maybe OOM
> since the
> > >> > joining process is in memory. If we have the "limitation
> mechanism", we
> > >> can
> > >> > refuse to execute the joining to avoid OOM.
> > >> >
> > >> > Or we can only do the check outside Calcite?
> > >> >
> > >> > Thanks
> > >> >
> > >>
> > >
> >
>

Reply via email to