XiDuo You created SPARK-37357:
---------------------------------
Summary: Add merged last partition factor for rebalance
Key: SPARK-37357
URL: https://issues.apache.org/jira/browse/SPARK-37357
Project: Spark
Issue Type: Sub-task
Components: SQL
Affects Versions: 3.3.0
Reporter: XiDuo You
`Rebalance` provide a functionality that split the large reduce partition into
smalls. However we have seen many SQL produce small files due to the last
partition.
Let's say we have one reduce partition and three map partitions and the blocks
are: [40, 60, 10, 10] and the target size is 100. We will get two files with
110 and 10. And it will get worse if there thousands of reduce partitions.
It should be helpful if we can merge the last small partition into previous.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]