Jungtaek Lim created SPARK-34255:
------------------------------------
Summary: DataSource V2: support static partitioning on required
distribution and ordering
Key: SPARK-34255
URL: https://issues.apache.org/jira/browse/SPARK-34255
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 3.2.0
Reporter: Jungtaek Lim
SPARK-34026 addressed the functionality of requiring repartition and sort order
from data source, but left the number of partitions during repartition as
depending on the config (default number of shuffle partitions).
Some special data sources may require the "static number of partitions" during
repartition - for example, state data source. Spark stores the state via
partitioned by "hash(group key) % default number of shuffle partitions", which
means state data source should do the same to rewrite the state data. And the
data source is required to "change" the default number of shuffle partitions,
as the value is not guaranteed to be same, and also there's a chance we change
the number of partitions to non-static one (like letting AQE decides it,
SPARK-34230).
This issue tracks the effort to support static number of partitions during
repartition.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]