xintongsong commented on code in PR #21890:
URL: https://github.com/apache/flink/pull/21890#discussion_r1105226779


##########
docs/content/docs/ops/batch/batch_shuffle.md:
##########
@@ -112,14 +112,27 @@ Hybrid shuffle provides two spilling strategies:
 
 ### Usage
 
-To use hybrid shuffle mode, you need to configure the 
[execution.batch-shuffle-mode]({{< ref "docs/deployment/config" 
>}}#execution.batch-shuffle-mode) to `ALL_EXCHANGES_HYBRID_FULL` (full spilling 
strategy) or `ALL_EXCHANGES_HYBRID_SELECTIVE` (selective spilling strategy).
+To use hybrid shuffle mode, you need to configure the 
[execution.batch-shuffle-mode]({{< ref "docs/deployment/config" 
>}}#execution-batch-shuffle-mode) to `ALL_EXCHANGES_HYBRID_FULL` (full spilling 
strategy) or `ALL_EXCHANGES_HYBRID_SELECTIVE` (selective spilling strategy).
+
+#### Data Consumption Constraints
+
+Hybrid shuffle divides the partition data consumption constraints between 
producer and consumer into the following three cases:
+
+- **ALL_PRODUCERS_FINISHED** : hybrid partition data can be consumed only when 
all producers are finished.
+- **ONLY_FINISHED_PRODUCERS** : hybrid partition data can be consumed when its 
producer is finished.
+- **UNFINISHED_PRODUCERS** : hybrid partition data can be consumed even if its 
producer is un-finished.
+
+These could be configured via 
[jobmanager.partition.hybrid.partition-data-consume-constraint]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-partition-hybrid-partition-data-consume-constraint).
+
+- **For `AdaptiveBatchScheduler`** : The default constraint is 
`UNFINISHED_PRODUCERS` to perform pipelined-like shuffle. If the value is set 
to `ALL_PRODUCERS_FINISHED` or `ONLY_FINISHED_PRODUCERS`, performance may be 
degraded.
+- **If `SpeculativeExecution` is enabled** : The default constraint is 
`ONLY_FINISHED_PRODUCERS` to bring some performance optimization compared with 
blocking shuffle. Since producers and consumers have the opportunity to run at 
the same time, more speculative execution tasks may be created, and the cost of 
failover will also increase. If you want to fall back to the same behavior as 
blocking shuffle, you can configure this value to `ALL_PRODUCERS_FINISHED`. It 
is also important to note that `UNFINISHED_PRODUCERS` is not supported in this 
mode.
 
 ### Limitations
 
 Hybrid shuffle mode is still experimental and has some known limitations, 
which the Flink community is still working on eliminating.
 
 - **No support for Slot Sharing.** In hybrid shuffle mode, Flink currently 
forces each task to be executed in a dedicated slot exclusively. If slot 
sharing is explicitly specified, an error will occur.
-- **No support for Adaptive Batch Scheduler and Speculative Execution.** If 
adaptive batch scheduler is used in hybrid shuffle mode, an error will occur.
+- **No optimization for dynamic graph.** If auto-parallelism(dynamic graph) is 
enabled for `AdaptiveBatchScheduler`, hybrid shuffle will always schedule tasks 
only when all producer are finished like blocking shuffle, this means that the 
constraint will fall back to `ALL_PRODUCERS_FINISHED` in this case.

Review Comment:
   ```suggestion
   - **No pipelined execution for dynamic graph.** If auto-parallelism (dynamic 
graph) is enabled, Adaptive Batch Scheduler will wait until upstream tasks 
finish to decide parallelism of downstream tasks, which means hybrid shuffle 
effectively fallback to blocking shuffle (`ALL_PRODUCERS_FINISHED` constraint).
   ```



##########
docs/content/docs/ops/batch/batch_shuffle.md:
##########
@@ -112,14 +112,27 @@ Hybrid shuffle provides two spilling strategies:
 
 ### Usage
 
-To use hybrid shuffle mode, you need to configure the 
[execution.batch-shuffle-mode]({{< ref "docs/deployment/config" 
>}}#execution.batch-shuffle-mode) to `ALL_EXCHANGES_HYBRID_FULL` (full spilling 
strategy) or `ALL_EXCHANGES_HYBRID_SELECTIVE` (selective spilling strategy).
+To use hybrid shuffle mode, you need to configure the 
[execution.batch-shuffle-mode]({{< ref "docs/deployment/config" 
>}}#execution-batch-shuffle-mode) to `ALL_EXCHANGES_HYBRID_FULL` (full spilling 
strategy) or `ALL_EXCHANGES_HYBRID_SELECTIVE` (selective spilling strategy).
+
+#### Data Consumption Constraints
+
+Hybrid shuffle divides the partition data consumption constraints between 
producer and consumer into the following three cases:
+
+- **ALL_PRODUCERS_FINISHED** : hybrid partition data can be consumed only when 
all producers are finished.
+- **ONLY_FINISHED_PRODUCERS** : hybrid partition data can be consumed when its 
producer is finished.
+- **UNFINISHED_PRODUCERS** : hybrid partition data can be consumed even if its 
producer is un-finished.

Review Comment:
   ```suggestion
   - **ONLY_FINISHED_PRODUCERS** : hybrid partition can only consume data from 
finished producers.
   - **UNFINISHED_PRODUCERS** : hybrid partition can consume data from 
unfinished producers.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to