Hi, If you call endpoint.ask[CommitFilesResponse](message), you should wait for response. If responses is successful, you can be sure commit files succeeds. Please refer to CommitHandler.requestCommitFilesWithRetry.
Thanks, Keyong Zhou <[email protected]> 于2023年7月13日周四 15:54写道: > > Following are the main steps for a shuffle stage: > > 1. LifecycleManager sends RequestSlots to Master to request slots for the > > current shuffle; > > 2. Master allocates slots among workers for the shuffle and > > returns RequestSlotsResponse; > > 3. LifecycleManager sends ReserveSlots to workers; workers do > > initialization; > > 4. ShuffleClient pushes data to workers; > > 5. When map task ends, ShuffleClient sends MapperEnd to LifecycleManager; > > 6. When all map tasks ended, LifecycleManager sends CommitFiles to > workers; > > 7. When CommitFiles succeeds, reducer tasks can read data from workers. > > Hello, > > Is there some way to use Celeborn API to check if CommitFiles succeeds in > step 6? Currently we are testing with TPC-DS 10TB data, and some heavy > query (query 24) occasionally fails with: > > Caused by: java.io.IOException: Premature EOF from inputStream > > We are speculating that this error occurs because we miss the check in > step 6. > > Thanks, > > --- Sungwoo > >
