[
https://issues.apache.org/jira/browse/SPARK-51460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17934133#comment-17934133
]
imarch1 zhang commented on SPARK-51460:
---------------------------------------
[~gaoyajun02] Have you encountered similar cases?
> Shuffle read and write are inconsistent when push-based shuffle is enabled
> --------------------------------------------------------------------------
>
> Key: SPARK-51460
> URL: https://issues.apache.org/jira/browse/SPARK-51460
> Project: Spark
> Issue Type: Bug
> Components: Shuffle
> Affects Versions: 3.3.0
> Reporter: imarch1 zhang
> Priority: Major
> Attachments: image-2025-03-11-09-11-16-656.png
>
>
> When push-based shuffle enabled, some spark applications in our cluster
> experienced shuffle data inconsistent. The metrics of Exchange are as follows:
> !image-2025-03-11-09-11-16-656.png!
> As seen in the picture, reduce tasks read more data than what map tasks
> write.
> The only clue we find is that the number of records read by all *successful*
> reduce tasks is consistent with the number of record written, which is
> 1,529,614,111. We fail to find out how come additional wrong records
> (1,529,974,564 - 1,529,614,111) appear in Exchange.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]