GitHub user uce opened a pull request:
https://github.com/apache/flink/pull/1503
[FLINK-3232] Add option to eagerly deploy channels
Adds a flag to the ExecutionGraph's IntermediateResult class indicating
whether the result consumers should be deployed eagerly. If true, the consumers
are deployed as soon as the partition is registered at the
ResultPartitionManager of the task manager. In practice, the deployment boils
down to updating unknown input channels of the consumers (because the actual
tasks are actually deployed all at once).
This behaviour is configured in the JobGraph generator and only activated
for streaming programs (StreamingJobGraphGenerator). It only makes sense for
pipelined results.
The motivation is to get down the latency of the first records passing a
pipeline. The initial update of the input channels causes a higher latency.
You can see this effect in the StreamingScalabilityAndLatency class (manual
test).
At the moment, this results in duplicate Akka messages when the first
record is produced (the message travels from the task to the job manager and
from the job manager to task manager, which then will be ignored at the
InputGate).
You can verify the decreased latency by adding a `Thread.sleep(2000)` to
`StreamingScalabilityAndLatency.TimeStampingSource`. In order to compare to the
old solution, uncomment lines 348--354 in `NetworkEnvironment`.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/uce/flink 3232-eager_channels
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/1503.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1503
----
commit 6946600e538ca6fa1f54cf87db94c48af0a3f0e8
Author: Ufuk Celebi <[email protected]>
Date: 2016-01-13T17:45:35Z
[FLINK-3232] Add option to eagerly deploy channels
Adds a flag to the ExecutionGraph's IntermediateResult class indicating
whether
the result consumers should be deployed eagerly. If true, the consumers are
deployed as soon as the partition is registered at the
ResultPartitionManager of
the task manager. In practice, the deployment boils down to updating unknown
input channels of the consumers (because the actual tasks are actually
deployed
all at once).
This behaviour is configured in the JobGraph generator and only activated
for
streaming programs (StreamingJobGraphGenerator). It only makes sense for
pipelined results.
The motivation is to get down the latency of the first records passing a
pipeline. The initial update of the input channels causes a higher latency.
You can see this effect in the StreamingScalabilityAndLatency class (manual
test).
At the moment, this results in duplicate Akka messages when the first record
is produced (the message travels from the task to the job manager and from
the
job manager to task manager, which then will be ignored at the InputGate).
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---