[
https://issues.apache.org/jira/browse/CALCITE-4223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17192721#comment-17192721
]
Julian Hyde commented on CALCITE-4223:
--------------------------------------
bq. 1) Do you agree to introduce column statistics?
Yes. I believe that {{RelMdPopulationSize}} and {{RelMdDistinctRowCount}} give
you unfiltered and filtered NDV.
But feel free to propose other statistics.
bq. 2) If so, where should we put them? (RelOptTable? Statistics? Or other
places)
You will need to do two things - first, store the statistics, and second make
them accessible (e.g. to planner rules, other statistics and the cost model).
To store them, very likely you will put a data structure such as a sketch in
your {{Table}} and make it accessible via the {{RelOptTable}} that wraps it.
Both of these can implement {{Wrapper}} to give access to the data structures
holding the statistics.
Then you should make them accessible via new or existing statistics methods
along the lines of {{RelMetadataQuery.getPopulationSize(RelNode,
ImmutableBitSet)}}. You will obviously want to implement for {{TableScan}} but
should try to implement for other {{RelNode}} sub-types as well.
> Introducing column statistics to RelOptTable
> --------------------------------------------
>
> Key: CALCITE-4223
> URL: https://issues.apache.org/jira/browse/CALCITE-4223
> Project: Calcite
> Issue Type: Improvement
> Reporter: Chunwei Lei
> Assignee: Chunwei Lei
> Priority: Major
>
> Many systems depend on column statistics to compute more accurate stats, such
> as NDV, average column size, and so on. It would be nice if Calcite can
> provide such an interface.
> Column statistics might include NDV, average/max column length, number of
> nulls, number of trues, number of falses and so on.
> What do you think?
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)