ulysses you created SPARK-34226: ----------------------------------- Summary: Reduce RepartitionOperation num partitions to its child max row Key: SPARK-34226 URL: https://issues.apache.org/jira/browse/SPARK-34226 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.2.0 Reporter: ulysses you
It's no meaning to repartition data if partition number is larger than data row, but would waste the resouce due to redundant task. With ETL case, we always inject `repartition` or `distribute by` to reduce the output partition but the partition number may bigger than data row. It's better that try our best to reduce the redundant partition. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org