ulysses you created SPARK-34226:
-----------------------------------
Summary: Reduce RepartitionOperation num partitions to its child
max row
Key: SPARK-34226
URL: https://issues.apache.org/jira/browse/SPARK-34226
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 3.2.0
Reporter: ulysses you
It's no meaning to repartition data if partition number is larger than data
row, but would waste the resouce due to redundant task.
With ETL case, we always inject `repartition` or `distribute by` to reduce the
output partition but the partition number may bigger than data row. It's better
that try our best to reduce the redundant partition.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]