wangshengjie123 commented on PR #2373: URL: https://github.com/apache/celeborn/pull/2373#issuecomment-2051387119
> Based on my current read, this does have correctness implications. I would suggest we should do either or all of the following: > > a) If recomputation happens, we should fail the stage and not allow retries - this will prevent data loss. > > b) We should recommend enabling replication to leverage this feature - this minimizes the risk of data loss which would trigger recomputation. > > Thoughts ? > > Also, how does this feature interact with `celeborn.client.shuffle.rangeReadFilter.enabled` ? Current if this pr is enabled, shuffle client won`t really apply rangeReadFilter, but we can avoid enable rangeReadFilter. Maybe we could close rangeReadFilter and set shuffle stage INDETEMINATE at shuffle level -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
