first commit comes from https://github.com/apache/flink/pull/6697
This solves GC issues for cases with low latency (small flushTimeout) and many
output channels and generally significantly improves low latency performance.
OutputFlusher remains as for now to trigger flushes for local subpartitions.
Registering periodic flushes in netty is unfortunately not the most beautiful
thing in the world at the moment. It is complicated by two things:
1. we do know about flushTimeout only in flink-streaming-java and
StreamTask, which is long after the point when we are actually creating
subpartitions
2. we do not know before hand which subpartitions will be local and which
will be remote

Average throughput is significantly higher only for extreme cases, however the
very important improvement here is solving (mitigating?) current GC issues,
which is visible on the "min" graph. Without this change 1ms latency with 1000+
output channels suffers from frequent very long GC pauses.
## Verifying this change
This change is cover by existing network stack tests, stress tests and almost
all it cases.
## Does this pull request potentially affect one of the following parts:
- Dependencies (does it add or upgrade a dependency): (yes / **no**)
- The public API, i.e., is any changed class annotated with
`@Public(Evolving)`: (yes / **no**)
- The serializers: (yes / **no** / don't know)
- The runtime per-record code paths (performance sensitive): (**yes** / no /
don't know)
- Anything that affects deployment or recovery: JobManager (and its
components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / **no** / don't know)
- The S3 file system connector: (yes / **no** / don't know)
## Documentation
- Does this pull request introduce a new feature? (yes / **no**)
- If yes, how is the feature documented? (**not applicable** / docs /
JavaDocs / not documented)
[ Full content available at: https://github.com/apache/flink/pull/6698 ]
This message was relayed via gitbox.apache.org for [email protected]