Yang Jie created SPARK-56176:
--------------------------------
Summary: V2-native ANALYZE TABLE and ANALYZE COLUMN for file tables
Key: SPARK-56176
URL: https://issues.apache.org/jira/browse/SPARK-56176
Project: Spark
Issue Type: Sub-task
Components: SQL
Affects Versions: 4.2.0
Reporter: Yang Jie
Implement `ANALYZE TABLE COMPUTE STATISTICS` and `ANALYZE COLUMN` for V2 file
tables without delegation to V1 commands. New `AnalyzeTableExec`: computes
table size and row count, persists via `TableCatalog.alterTable()` +
`TableChange.setProperty()` (`spark.sql.statistics.totalSize`,
`spark.sql.statistics.numRows`). New `AnalyzeColumnExec`: computes column-level
statistics (min, max, nullCount, distinctCount, avgLen, maxLen).
`FileScan.numRows()` reads stored row count from table properties (injected via
`FileTable.mergedOptions` as `__numRows`). `DataSourceV2Strategy` routes
`AnalyzeTable`/`AnalyzeColumn` for `FileTable` to V2 exec nodes.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]