[jira] [Created] (SPARK-39644) Add RangePartitioning to DataSource V2

Enrico Minack (Jira) Thu, 30 Jun 2022 10:24:25 -0700

Enrico Minack created SPARK-39644:
-------------------------------------

             Summary: Add RangePartitioning to DataSource V2
                 Key: SPARK-39644
                 URL: https://issues.apache.org/jira/browse/SPARK-39644
             Project: Spark
          Issue Type: New Feature
          Components: SQL
    Affects Versions: 3.4.0
            Reporter: Enrico Minack



DataSourceV2 allows data sources to report existing partitioning of read data 
(org.apache.spark.sql.connector.read.partitioning). Currently, there is only 
KeyGroupedPartitioning and UnknownPartitioning. Data sources should be able to 
report global ordered data so that downstream operations can exploit this.

The following is required for this to work:
- Define RangePartitioning as a new implementation of Partitioning
- Add Catalyst rules that handle this partitioning
- Add a test source that reports ordering to proof that subsequent operations 
that require order do not invoke sorting the data.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (SPARK-39644) Add RangePartitioning to DataSource V2

Reply via email to