xintongsong commented on code in PR #21890:
URL: https://github.com/apache/flink/pull/21890#discussion_r1101054167
##########
docs/content.zh/docs/ops/batch/batch_shuffle.md:
##########
@@ -114,12 +114,34 @@ Hybrid shuffle provides two spilling strategies:
To use hybrid shuffle mode, you need to configure the
[execution.batch-shuffle-mode]({{< ref "docs/deployment/config"
>}}#execution-batch-shuffle-mode) to `ALL_EXCHANGES_HYBRID_FULL` (full spilling
strategy) or `ALL_EXCHANGES_HYBRID_SELECTIVE` (selective spilling strategy).
+#### Supports AdaptiveBatchScheduler and SpeculativeExecution
+
+Hybrid shuffle currently supports `AdaptiveBatchScheduler` by default. If you
want to use `DefaultScheduler`, please configure the [jobmanager.scheduler]({{<
ref "docs/deployment/config" >}}#jobmanager-scheduler) to `DefaultScheduler`.
See [elastic_scaling]({{< ref "docs/deployment/elastic_scaling"
>}}#adaptive-batch-scheduler) for details.
+
+If you want to enable `SpeculativeExecution` in the same time, see
[speculative_execution]({{< ref "docs/deployment/speculative_execution" >}})
for details.
Review Comment:
This is irrelevant to hybrid shuffle.
##########
docs/content.zh/docs/ops/batch/batch_shuffle.md:
##########
@@ -114,12 +114,34 @@ Hybrid shuffle provides two spilling strategies:
To use hybrid shuffle mode, you need to configure the
[execution.batch-shuffle-mode]({{< ref "docs/deployment/config"
>}}#execution-batch-shuffle-mode) to `ALL_EXCHANGES_HYBRID_FULL` (full spilling
strategy) or `ALL_EXCHANGES_HYBRID_SELECTIVE` (selective spilling strategy).
+#### Supports AdaptiveBatchScheduler and SpeculativeExecution
+
+Hybrid shuffle currently supports `AdaptiveBatchScheduler` by default. If you
want to use `DefaultScheduler`, please configure the [jobmanager.scheduler]({{<
ref "docs/deployment/config" >}}#jobmanager-scheduler) to `DefaultScheduler`.
See [elastic_scaling]({{< ref "docs/deployment/elastic_scaling"
>}}#adaptive-batch-scheduler) for details.
+
+If you want to enable `SpeculativeExecution` in the same time, see
[speculative_execution]({{< ref "docs/deployment/speculative_execution" >}})
for details.
+
+Hybrid shuffle divides the partition data consumption constraints between
producer and consumer into the following three cases:
+
+- **ALL_PRODUCERS_FINISHED** : hybrid partition data can be consumed only when
all producers are finished.
+- **ONLY_FINISHED_PRODUCERS** : hybrid partition data can be consumed when its
producer is finished.
+- **UNFINISHED_PRODUCERS** : hybrid partition data can be consumed even if its
producer is un-finished.
+
+If `SpeculativeExecution` is enabled, the default constraint is
`ONLY_FINISHED_PRODUCERS` to bring some performance optimization compared with
blocking shuffle. Otherwise, the default constraint is `UNFINISHED_PRODUCERS`
to perform pipelined-like shuffle. These could be configured via
[jobmanager.partition.hybrid.partition-data-consume-constraint]({{< ref
"docs/deployment/config"
>}}#jobmanager-partition-hybrid-partition-data-consume-constraint).
Review Comment:
What is the potential impacts when changing this option?
##########
docs/content.zh/docs/ops/batch/batch_shuffle.md:
##########
@@ -114,12 +114,34 @@ Hybrid shuffle provides two spilling strategies:
To use hybrid shuffle mode, you need to configure the
[execution.batch-shuffle-mode]({{< ref "docs/deployment/config"
>}}#execution-batch-shuffle-mode) to `ALL_EXCHANGES_HYBRID_FULL` (full spilling
strategy) or `ALL_EXCHANGES_HYBRID_SELECTIVE` (selective spilling strategy).
+#### Supports AdaptiveBatchScheduler and SpeculativeExecution
+
+Hybrid shuffle currently supports `AdaptiveBatchScheduler` by default. If you
want to use `DefaultScheduler`, please configure the [jobmanager.scheduler]({{<
ref "docs/deployment/config" >}}#jobmanager-scheduler) to `DefaultScheduler`.
See [elastic_scaling]({{< ref "docs/deployment/elastic_scaling"
>}}#adaptive-batch-scheduler) for details.
+
+If you want to enable `SpeculativeExecution` in the same time, see
[speculative_execution]({{< ref "docs/deployment/speculative_execution" >}})
for details.
+
+Hybrid shuffle divides the partition data consumption constraints between
producer and consumer into the following three cases:
+
+- **ALL_PRODUCERS_FINISHED** : hybrid partition data can be consumed only when
all producers are finished.
+- **ONLY_FINISHED_PRODUCERS** : hybrid partition data can be consumed when its
producer is finished.
+- **UNFINISHED_PRODUCERS** : hybrid partition data can be consumed even if its
producer is un-finished.
+
+If `SpeculativeExecution` is enabled, the default constraint is
`ONLY_FINISHED_PRODUCERS` to bring some performance optimization compared with
blocking shuffle. Otherwise, the default constraint is `UNFINISHED_PRODUCERS`
to perform pipelined-like shuffle. These could be configured via
[jobmanager.partition.hybrid.partition-data-consume-constraint]({{< ref
"docs/deployment/config"
>}}#jobmanager-partition-hybrid-partition-data-consume-constraint).
+
+#### Index Spilling
+
+Hybrid shuffle indexes the shuffle data in memory and disk. Generally
speaking, all index can be cached in memory to speed up index retrieval.
However, for large batch jobs, this part of memory may bring OOM risks.
+Therefore, hybrid shuffle supports spilling index data to disk. The following
configuration options can control this behavior:
+
+-
**[taskmanager.network.hybrid-shuffle.num-retained-in-memory-regions-max]({{<
ref "docs/deployment/config"
>}}#taskmanager-network-hybrid-shuffle-num-retained-in-memory-regions-max)** :
Controls the max number of hybrid retained regions in memory. Increasing this
value will allow more index entries to be cached in memory.
+- **[taskmanager.network.hybrid-shuffle.spill-index-segment-size]({{< ref
"docs/deployment/config"
>}}#taskmanager-network-hybrid-shuffle-spill-index-segment-size)** : Controls
the segment size(in bytes) of hybrid spilled file data index.
Review Comment:
I wonder if we want to advertise these options. They are likely removed in
future releases.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]