[
https://issues.apache.org/jira/browse/SPARK-55884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated SPARK-55884:
-----------------------------------
Labels: pull-request-available (was: )
> Add utility to convert CatalogStatistics to V2 Statistics for DSv2 connectors
> -----------------------------------------------------------------------------
>
> Key: SPARK-55884
> URL: https://issues.apache.org/jira/browse/SPARK-55884
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 4.2.0
> Reporter: Xin Huang
> Priority: Major
> Labels: pull-request-available
>
> Currently, DSv2 connectors (such as Delta Kernel) often need to utilize
> statistics stored in the V1 catalog (e.g., Hive Metastore) to optimize query
> planning. However, there is no shared utility in Spark to convert these V1
> `CatalogStatistics` and `CatalogColumnStat` objects into the V2
> `org.apache.spark.sql.connector.read.Statistics` and `ColumnStatistics`
> interfaces expected by DSv2 APIs.
> This leads to connectors having to reimplement this conversion logic
> themselves.
> This ticket proposes adding a new utility method,
> `DataSourceV2Relation.transformV1Stats`, to perform this conversion. This
> would mirror the existing `DataSourceV2Relation.transformV2Stats` (which
> handles the reverse V2 -> V1 direction) and ensure consistent handling of:
> * Table-level stats (sizeInBytes, rowCount)
> * Column-level stats (min, max, nullCount, distinctCount, avgLen, maxLen)
> * Histograms (converting internal Histogram types to V2 interfaces)
> Placing this logic in `DataSourceV2Relation` ensures that V1 catalog classes
> remain decoupled from V2 connector interfaces.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]