Ted Xu commented on CALCITE-1933:

Thanks [~julianhyde] for the reply.

You're right, expression cost can be provided as metadata. In fact, I was 
proposing to use Schema because we treat expression cost as SqlOperator 
attribute _in our own implementation_. This makes users' hint (settings in 
metadata service or annotations on UDF code) injection more convenient. Here we 
say expression cost we mean _expression complexity_, the underlying concept 

1. Certain cost of a call to given SqlOperator is a constant
2. The cost is irrelevant to the input data

On second thought, I found above assumptions was too simple. The cost or 
complexity of expression may depend on:

1. Data types (e.g., cost of cast from string to double and integer to double 
is different)
2. Input data value, distribution, ordering, etc. (e.g., like '%foobar%' and 
like '%foo%bar%' is different)
3. Execution engine implementation details, like caching\[1\], code generation 
and vectorized execution.

If we treat expression cost as metadata, the interface could be simplified as 

interface Handler extends MetadataHandler<ExpressionComplexity> {
  Double getExpressionComplexity(RelNode r, RelMetadataQuery mq,
      RexNode expression, RexNode predicate);

It looks not necessarily like a 'core' functionality and makes no much 
difference to NonCummulativeCost itself. But still it would help estimating 
filter cost.

I'm not sure if I explained my idea clearly. What do you think? Shall we 
introduce such a metadata?

\[1\] Stonebraker, M., & Hellerstein, J. M. (n.d.). Optimizing Predicate 
Migration : Queries with Expensive Predicates, 267–276.

> Cost estimation for expression evaluate
> ---------------------------------------
>                 Key: CALCITE-1933
>                 URL: https://issues.apache.org/jira/browse/CALCITE-1933
>             Project: Calcite
>          Issue Type: New Feature
>            Reporter: Ted Xu
>            Assignee: Julian Hyde
> Expression evaluate performance is different one to another based on its 
> operator type and underlying execution engine. For example, a regex pattern 
> matching (e.g., col like 'foo%bar%') may requires 100 times more cost 
> compares to a simple comparison (e.g., col > 3). Currently we don't have this 
> kind of differentiation.
> Expression cost will grant following applications:
> 1. Improved predicate push down: predicates can be pushed down to the right 
> position based on row count and its own cost.
> 2. Predicate ordering rule: predicate with different order, e.g.,
> {code:sql}
> col1 like 'foo%bar%' and col2 > 3
> col2 > 3 and col1 like 'foo%bar%'
> {code}
> may have different performance, introducing a predicate ordering rule will 
> help find a better one.
> The cost of each RexCall / RexAggCall is provided by Schema, stored in an 
> external metadata service.

This message was sent by Atlassian JIRA

Reply via email to