imarch1 zhang created SPARK-51460:
-------------------------------------
Summary: Shuffle read and write are inconsistent when push-based
shuffle is enabled
Key: SPARK-51460
URL: https://issues.apache.org/jira/browse/SPARK-51460
Project: Spark
Issue Type: Bug
Components: Shuffle
Affects Versions: 3.3.0
Reporter: imarch1 zhang
When push-based shuffle enabled, some spark applications in our cluster
experienced shuffle data inconsistent. The metrics of Exchange as follows:
!image-2025-03-11-09-04-04-265.png!
As seen in the picture, reduce tasks read more data than what map tasks write.
The only clue we find is that the number of records read by all *successful*
reduce tasks is consistent with the number of record written, which is
1,529,614,111. We fail to find out how come additional wrong records
(1,529,974,564 - 1,529,614,111) appear in Exchange.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]