[
https://issues.apache.org/jira/browse/CALCITE-4223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17202938#comment-17202938
]
Julian Hyde commented on CALCITE-4223:
--------------------------------------
I don't see how Flink's {{ColumnStats}} and Drill's {{ColumnStatistics}} play
into this. I would expect each engine, Flink and Drill in this case, to have
their own data structure(s) to store statistics. But then we need to make those
statistics available to rules in Calcite that are not aware of the engine and
its particular statistics data structure.
I don't know why Flink and Drill have not integrated their statistics into
Calcite. Maybe they didn't know how. They could have asked. Or we could have
written better documentation.
Using a Java interface is a poor choice for extensibility. Let's suppose that
we add your {{interface ColumnStatistics}} to Calcite. Let's suppose that Drill
creates {{interface DrillColumnStatistics extends ColumnStatistics}} with one
extra method, and Flink creates {{interface FlinkColumnStatistics extends
ColumnStatistics}} with two extra methods. Now there's no interface with all of
the extra methods.
Calcite's approach is to make each statistic an interface with one method (or
occasionally two, if closely related). So an engine can implement the ones it
has, and ignore the others. It is a better extensibility story than what you
propose.
> Introducing column statistics to RelOptTable
> --------------------------------------------
>
> Key: CALCITE-4223
> URL: https://issues.apache.org/jira/browse/CALCITE-4223
> Project: Calcite
> Issue Type: Improvement
> Reporter: Chunwei Lei
> Assignee: Chunwei Lei
> Priority: Major
> Labels: pull-request-available
> Time Spent: 40m
> Remaining Estimate: 0h
>
> Many systems depend on column statistics to compute more accurate stats, such
> as NDV, average column size, and so on. It would be nice if Calcite can
> provide such an interface.
> Column statistics might include NDV, average/max column length, number of
> nulls, number of trues, number of falses and so on.
> What do you think?
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)