[
https://issues.apache.org/jira/browse/CALCITE-4223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17190504#comment-17190504
]
Chunwei Lei commented on CALCITE-4223:
--------------------------------------
[~julianhyde], column statistics might include NDV, average/max column length,
number of nulls, number of trues, number of falses, TopK. Some systems like
Hive[1] provide a command to collect these stats. Providing we have such column
stats, we can:
1) get more accurate NDV of table scan than estimation.
2) estimate more accurate size of inputs of Join if the columns' types include
varchar, which helps decide whether to use HashJoin or MergeJoin(Because we
have average column length).
[1]https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-ColumnStatistics
> Introducing column statistics to RelOptTable
> --------------------------------------------
>
> Key: CALCITE-4223
> URL: https://issues.apache.org/jira/browse/CALCITE-4223
> Project: Calcite
> Issue Type: Improvement
> Reporter: Chunwei Lei
> Assignee: Chunwei Lei
> Priority: Major
>
> Many systems depend on column statistics to compute more accurate stats, such
> as NDV, average column size, and so on. It would be nice if Calcite can
> provide such an interface.
> Column statistics might include NDV, average/max column length, number of
> nulls, number of trues, number of falses and so on.
> What do you think?
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)