[
https://issues.apache.org/jira/browse/CALCITE-4223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17197526#comment-17197526
]
Julian Hyde commented on CALCITE-4223:
--------------------------------------
I see in the PR you have created {{interface ColStatistics}} and added a method
to {{RelOptTable}} to get it.
I said above, and still think, that this is not the right approach. It does not
easily allow people to add new kinds of metadata, and it does not accommodate
differences in data structures that may have more information (e.g. a system
that has a histogram that returns not just number of distinct values, but the
number of distinct values between 100 and 1000).
I introduced {{interface Statistics}} to make the simple case easy. It is not a
template that we should try to extend.
Suppose you could query any metadata interface on a {{RelOptTable}} using
{{unwrap}}. Then you can easily implement metadata. For example, in
{{RelMdSize}}:
{code}
public Double averageRowSize(TableScan scan, RelMetadataQuery mq) {
final RelOptTable table = scan.getTable();
final BuiltInMetadata.Size size =
table.unwrap(BuiltInMetadata.Size.class);
if (size != null) {
return size.averageRowSize();
}
return null;
}
{code}
I think that is much more elegant and straightforward.
Of course the implementor of the particular type of table will have to
implement the necessary interfaces, but I don't think that will be hard.
> Introducing column statistics to RelOptTable
> --------------------------------------------
>
> Key: CALCITE-4223
> URL: https://issues.apache.org/jira/browse/CALCITE-4223
> Project: Calcite
> Issue Type: Improvement
> Reporter: Chunwei Lei
> Assignee: Chunwei Lei
> Priority: Major
> Labels: pull-request-available
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Many systems depend on column statistics to compute more accurate stats, such
> as NDV, average column size, and so on. It would be nice if Calcite can
> provide such an interface.
> Column statistics might include NDV, average/max column length, number of
> nulls, number of trues, number of falses and so on.
> What do you think?
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)