SteNicholas opened a new pull request, #2932:
URL: https://github.com/apache/celeborn/pull/2932
### What changes were proposed in this pull request?
Flink supports fallback to vanilla Flink built-in shuffle implementation.
### Why are the changes needed?
When quota is unenough or workers are unavailable, `RemoteShuffleMaster`
does not support fallback to `NettyShuffleMaster`, and
`RemoteShuffleEnvironment` does not support fallback to
`NettyShuffleEnvironment` at present. Flink should support fallback to vanilla
Flink built-in shuffle implementation for unenough quota and unavailable
workers.
### Does this PR introduce _any_ user-facing change?
- Introduce `ShuffleFallbackPolicy` interface to determine whether fallback
to vanilla Flink built-in shuffle implementation.
```
/**
* The shuffle fallback policy determines whether fallback to vanilla Flink
built-in shuffle
* implementation.
*/
public interface ShuffleFallbackPolicy {
/**
* Returns whether fallback to vanilla flink built-in shuffle
implementation.
*
* @param shuffleContext The job shuffle context of Flink.
* @param celebornConf The configuration of Celeborn.
* @param lifecycleManager The {@link LifecycleManager} of Celeborn.
* @return Whether fallback to vanilla flink built-in shuffle
implementation.
*/
boolean needFallback(
JobShuffleContext shuffleContext,
CelebornConf celebornConf,
LifecycleManager lifecycleManager);
}
```
- Introduce `celeborn.client.flink.shuffle.fallback.policy` config to
support shuffle fallback policy configuration.
### How was this patch tested?
`WordCountTestBase#celeborn flink integration test with fallback - word
count`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]