waitinfuture commented on PR #2373: URL: https://github.com/apache/celeborn/pull/2373#issuecomment-2041062688
> Ah, I see what you mean ... `PartitionLocation` would change between retries. Yeah, this is a problem then - it will cause data loss. This would be a variant of SPARK-23207 > > I will need to relook at the PR, and how it interact with Celeborn - but if scenarios directly described in SPARK-23207 (or variants of it) are applicable (and we cant mitigate it), we should not proceed down this path given the correctness implications unfortunately. Maybe we can remain both this optimization and stage rerun, but only allows one to take effect by checking configs for now. The performance issue this PR solves does happen in production. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
