[
https://issues.apache.org/jira/browse/CALCITE-4223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17200256#comment-17200256
]
Julian Hyde commented on CALCITE-4223:
--------------------------------------
[~Chunwei Lei], We don't need to change {{interface RelOptTable}} at all. We
don't need a new {{interface ColumnStatistics}}. But we should change all of
the metadata methods that deal with table scans to see whether the table has
the statistics so that we can return a better result.
For example:
{noformat}
diff --git a/core/src/main/java/org/apache/calcite/rel/metadata/RelMdSize.java
b/core/src/main/java/org/apache/calcite/rel/metadata/RelMdSize.java
index 458df6b34..d50e32a51 100644
--- a/core/src/main/java/org/apache/calcite/rel/metadata/RelMdSize.java
+++ b/core/src/main/java/org/apache/calcite/rel/metadata/RelMdSize.java
@@ -172,6 +172,11 @@ public Double averageRowSize(RelNode rel, RelMetadataQuery
mq) {
public List<Double> averageColumnSizes(TableScan rel, RelMetadataQuery mq) {
final List<RelDataTypeField> fields = rel.getRowType().getFieldList();
+ final BuiltInMetadata.Size size =
+ rel.getTable().unwrap(BuiltInMetadata.Size.class);
+ if (size != null && size.averageColumnSizes() != null) {
+ return size.averageColumnSizes();
+ }
final ImmutableList.Builder<Double> list = ImmutableList.builder();
for (RelDataTypeField field : fields) {
list.add(averageTypeValueSize(field.getType()));
{noformat}
> Introducing column statistics to RelOptTable
> --------------------------------------------
>
> Key: CALCITE-4223
> URL: https://issues.apache.org/jira/browse/CALCITE-4223
> Project: Calcite
> Issue Type: Improvement
> Reporter: Chunwei Lei
> Assignee: Chunwei Lei
> Priority: Major
> Labels: pull-request-available
> Time Spent: 40m
> Remaining Estimate: 0h
>
> Many systems depend on column statistics to compute more accurate stats, such
> as NDV, average column size, and so on. It would be nice if Calcite can
> provide such an interface.
> Column statistics might include NDV, average/max column length, number of
> nulls, number of trues, number of falses and so on.
> What do you think?
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)