[jira] [Updated] (SPARK-55884) Add utility to convert CatalogStatistics to V2 Statistics for DSv2 connectors

ASF GitHub Bot (Jira) Tue, 10 Mar 2026 16:49:08 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-55884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


ASF GitHub Bot updated SPARK-55884:
-----------------------------------
    Labels: pull-request-available  (was: )

> Add utility to convert CatalogStatistics to V2 Statistics for DSv2 connectors
> -----------------------------------------------------------------------------
>
>                 Key: SPARK-55884
>                 URL: https://issues.apache.org/jira/browse/SPARK-55884
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 4.2.0
>            Reporter: Xin Huang
>            Priority: Major
>              Labels: pull-request-available
>
> Currently, DSv2 connectors (such as Delta Kernel) often need to utilize 
> statistics stored in the V1 catalog (e.g., Hive Metastore) to optimize query 
> planning. However, there is no shared utility in Spark to convert these V1 
> `CatalogStatistics` and `CatalogColumnStat` objects into the V2 
> `org.apache.spark.sql.connector.read.Statistics` and `ColumnStatistics` 
> interfaces expected by DSv2 APIs.
> This leads to connectors having to reimplement this conversion logic 
> themselves.
> This ticket proposes adding a new utility method, 
> `DataSourceV2Relation.transformV1Stats`, to perform this conversion. This 
> would mirror the existing `DataSourceV2Relation.transformV2Stats` (which 
> handles the reverse V2 -> V1 direction) and ensure consistent handling of:
> * Table-level stats (sizeInBytes, rowCount)
> * Column-level stats (min, max, nullCount, distinctCount, avgLen, maxLen)
> * Histograms (converting internal Histogram types to V2 interfaces)
> Placing this logic in `DataSourceV2Relation` ensures that V1 catalog classes 
> remain decoupled from V2 connector interfaces.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-55884) Add utility to convert CatalogStatistics to V2 Statistics for DSv2 connectors

Reply via email to