[
https://issues.apache.org/jira/browse/SPARK-51505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ziqi Liu updated SPARK-51505:
-----------------------------
Description:
There're cases where shuffle is highly skewed and many partitions are
empty(probably due to small NDV), AQE coalesce metrics might look confusing and
user might think it wrongly coalesce to large partitions, while the actual
situation is that a few partitions are super large while others are empty.
We'd better log empty partition number in the metrics.
was:
There're cases where shuffle is highly skewed and many partitions (probably due
to small NDV), AQE coalesce metrics might look confusing and user might think
it wrongly coalesce to large partitions, while the actual situation is that a
few partitions are super large while others are empty.
We'd better log empty partition number in the metrics.
> Log empty partition number metrics in AQE coalesce
> --------------------------------------------------
>
> Key: SPARK-51505
> URL: https://issues.apache.org/jira/browse/SPARK-51505
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 4.0.0
> Reporter: Ziqi Liu
> Priority: Major
>
> There're cases where shuffle is highly skewed and many partitions are
> empty(probably due to small NDV), AQE coalesce metrics might look confusing
> and user might think it wrongly coalesce to large partitions, while the
> actual situation is that a few partitions are super large while others are
> empty.
> We'd better log empty partition number in the metrics.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]