Hi,

If you call endpoint.ask[CommitFilesResponse](message), you should wait for
response. If responses
is successful, you can be sure commit files succeeds. Please refer
to CommitHandler.requestCommitFilesWithRetry.

Thanks,
Keyong Zhou

<[email protected]> 于2023年7月13日周四 15:54写道:

> > Following are the main steps for a shuffle stage:
> > 1. LifecycleManager sends RequestSlots to Master to request slots for the
> > current shuffle;
> > 2. Master allocates slots among workers for the shuffle and
> > returns RequestSlotsResponse;
> > 3. LifecycleManager sends ReserveSlots to workers; workers do
> > initialization;
> > 4. ShuffleClient pushes data to workers;
> > 5. When map task ends, ShuffleClient sends MapperEnd to LifecycleManager;
> > 6. When all map tasks ended, LifecycleManager sends CommitFiles to
> workers;
> > 7. When CommitFiles succeeds, reducer tasks can read data from workers.
>
> Hello,
>
> Is there some way to use Celeborn API to check if CommitFiles succeeds in
> step 6? Currently we are testing with TPC-DS 10TB data, and some heavy
> query (query 24) occasionally fails with:
>
>    Caused by: java.io.IOException: Premature EOF from inputStream
>
> We are speculating that this error occurs because we miss the check in
> step 6.
>
> Thanks,
>
> --- Sungwoo
>
>

Reply via email to