Juliusz Sompolski created SPARK-23445: -----------------------------------------
Summary: ColumnStat refactoring Key: SPARK-23445 URL: https://issues.apache.org/jira/browse/SPARK-23445 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.0 Reporter: Juliusz Sompolski Refactor ColumnStat to be more flexible. * Split {{ColumnStat}} and {{CatalogColumnStat}} just like {{CatalogStatistics}} is split from {{Statistics}}. This detaches how the statistics are stored from how they are processed in the query plan. {{CatalogColumnStat}} keeps {{min}} and {{max}} as {{String}}, making it not depend on dataType information. * For {{CatalogColumnStat}}, parse column names from property names in the metastore ({{KEY_VERSION }}property), not from metastore schema. This allows the catalog to read stats into {{CatalogColumnStat}}s even if the schema itself is not in the metastore. * Make all fields optional. {{min}}, {{max}} and {{histogram}} for columns were optional already. Having them all optional is more consistent, and gives flexibility to e.g. drop some of the fields through transformations if they are difficult / impossible to calculate. The added flexibility will make it possible to have alternative implementations for stats, and separates stats collection from stats and estimation processing in plans. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org