Enrico Minack created SPARK-39644:
-------------------------------------
Summary: Add RangePartitioning to DataSource V2
Key: SPARK-39644
URL: https://issues.apache.org/jira/browse/SPARK-39644
Project: Spark
Issue Type: New Feature
Components: SQL
Affects Versions: 3.4.0
Reporter: Enrico Minack
DataSourceV2 allows data sources to report existing partitioning of read data
(org.apache.spark.sql.connector.read.partitioning). Currently, there is only
KeyGroupedPartitioning and UnknownPartitioning. Data sources should be able to
report global ordered data so that downstream operations can exploit this.
The following is required for this to work:
- Define RangePartitioning as a new implementation of Partitioning
- Add Catalyst rules that handle this partitioning
- Add a test source that reports ordering to proof that subsequent operations
that require order do not invoke sorting the data.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]