Yang Jie created SPARK-56176:
--------------------------------

             Summary: V2-native ANALYZE TABLE and ANALYZE COLUMN for file tables
                 Key: SPARK-56176
                 URL: https://issues.apache.org/jira/browse/SPARK-56176
             Project: Spark
          Issue Type: Sub-task
          Components: SQL
    Affects Versions: 4.2.0
            Reporter: Yang Jie


Implement `ANALYZE TABLE COMPUTE STATISTICS` and `ANALYZE COLUMN` for V2 file 
tables without delegation to V1 commands. New `AnalyzeTableExec`: computes 
table size and row count, persists via `TableCatalog.alterTable()` + 
`TableChange.setProperty()` (`spark.sql.statistics.totalSize`, 
`spark.sql.statistics.numRows`). New `AnalyzeColumnExec`: computes column-level 
statistics (min, max, nullCount, distinctCount, avgLen, maxLen). 
`FileScan.numRows()` reads stored row count from table properties (injected via 
`FileTable.mergedOptions` as `__numRows`). `DataSourceV2Strategy` routes 
`AnalyzeTable`/`AnalyzeColumn` for `FileTable` to V2 exec nodes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to