[ 
https://issues.apache.org/jira/browse/CALCITE-4223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17202938#comment-17202938
 ] 

Julian Hyde commented on CALCITE-4223:
--------------------------------------

I don't see how Flink's {{ColumnStats}} and Drill's {{ColumnStatistics}} play 
into this. I would expect each engine, Flink and Drill in this case, to have 
their own data structure(s) to store statistics. But then we need to make those 
statistics available to rules in Calcite that are not aware of the engine and 
its particular statistics data structure.

I don't know why Flink and Drill have not integrated their statistics into 
Calcite. Maybe they didn't know how. They could have asked. Or we could have 
written better documentation.

Using a Java interface is a poor choice for extensibility. Let's suppose that 
we add your {{interface ColumnStatistics}} to Calcite. Let's suppose that Drill 
creates {{interface DrillColumnStatistics extends ColumnStatistics}} with one 
extra method, and Flink creates {{interface FlinkColumnStatistics extends 
ColumnStatistics}} with two extra methods. Now there's no interface with all of 
the extra methods.

Calcite's approach is to make each statistic an interface with one method (or 
occasionally two, if closely related). So an engine can implement the ones it 
has, and ignore the others. It is a better extensibility story than what you 
propose.

> Introducing column statistics to RelOptTable
> --------------------------------------------
>
>                 Key: CALCITE-4223
>                 URL: https://issues.apache.org/jira/browse/CALCITE-4223
>             Project: Calcite
>          Issue Type: Improvement
>            Reporter: Chunwei Lei
>            Assignee: Chunwei Lei
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> Many systems depend on column statistics to compute more accurate stats, such 
> as NDV, average column size, and so on. It would be nice if Calcite can 
> provide such an interface.
> Column statistics might include NDV, average/max column length, number of 
> nulls, number of trues, number of falses and so on. 
> What do you think?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to