Xin Huang created SPARK-55884:
---------------------------------

             Summary: Add utility to convert CatalogStatistics to V2 Statistics 
for DSv2 connectors
                 Key: SPARK-55884
                 URL: https://issues.apache.org/jira/browse/SPARK-55884
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 4.2.0
            Reporter: Xin Huang


Currently, DSv2 connectors (such as Delta Kernel) often need to utilize 
statistics stored in the V1 catalog (e.g., Hive Metastore) to optimize query 
planning. However, there is no shared utility in Spark to convert these V1 
`CatalogStatistics` and `CatalogColumnStat` objects into the V2 
`org.apache.spark.sql.connector.read.Statistics` and `ColumnStatistics` 
interfaces expected by DSv2 APIs.

This leads to connectors having to reimplement this conversion logic themselves.

This ticket proposes adding a new utility method, 
`DataSourceV2Relation.transformV1Stats`, to perform this conversion. This would 
mirror the existing `DataSourceV2Relation.transformV2Stats` (which handles the 
reverse V2 -> V1 direction) and ensure consistent handling of:
* Table-level stats (sizeInBytes, rowCount)
* Column-level stats (min, max, nullCount, distinctCount, avgLen, maxLen)
* Histograms (converting internal Histogram types to V2 interfaces)

Placing this logic in `DataSourceV2Relation` ensures that V1 catalog classes 
remain decoupled from V2 connector interfaces.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to