Jungtaek Lim created SPARK-34255:
------------------------------------

             Summary: DataSource V2: support static partitioning on required 
distribution and ordering
                 Key: SPARK-34255
                 URL: https://issues.apache.org/jira/browse/SPARK-34255
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 3.2.0
            Reporter: Jungtaek Lim


SPARK-34026 addressed the functionality of requiring repartition and sort order 
from data source, but left the number of partitions during repartition as 
depending on the config (default number of shuffle partitions).

Some special data sources may require the "static number of partitions" during 
repartition - for example, state data source. Spark stores the state via 
partitioned by "hash(group key) % default number of shuffle partitions", which 
means state data source should do the same to rewrite the state data. And the 
data source is required to "change" the default number of shuffle partitions, 
as the value is not guaranteed to be same, and also there's a chance we change 
the number of partitions to non-static one (like letting AQE decides it, 
SPARK-34230).

This issue tracks the effort to support static number of partitions during 
repartition.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to