[
https://issues.apache.org/jira/browse/FLINK-9913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nico Kruber updated FLINK-9913:
-------------------------------
Description:
Currently the {{RecordWriter}} emits output into multi channels via
{{ChannelSelector}} or broadcasts output to all channels directly. Each
channel has a separate {{RecordSerializer}} for serializing outputs, that means
the output will be serialized as many times as the number of selected channels.
As we know, data serialization is a high cost operation, so we can get good
benefits by improving the serialization only once.
I would suggest the following changes for realizing it.
# Only one {{RecordSerializer}} is created in {{RecordWriter}} for all the
channels.
# The output is serialized into the intermediate data buffer only once for
different channels.
# The intermediate serialization results are copied into different
{{BufferBuilder}}s for different channels.
An additional benefit by using a single serializer for all channels is that we
get a potentially significant reduction on heap space overhead from fewer
intermediate serialization buffers (only once we got over 5MiB, these buffers
were pruned back to 128B!).
was:
Currently the {{RecordWriter}} emits output into multi channels via
{{ChannelSelector}} or broadcasts output to all channels directly. Each
channel has a separate {{RecordSerializer}} for serializing outputs, that means
the output will be serialized as many times as the number of selected channels.
As we know, data serialization is a high cost operation, so we can get good
benefits by improving the serialization only once.
I would suggest the following changes for realizing it.
# Only one {{RecordSerializer}} is created in {{RecordWriter}} for all the
channels.
# The output is serialized into the intermediate data buffer only once for
different channels.
# The intermediate serialization results are copied into different
{{BufferBuilder}}s for different channels.
> Improve output serialization only once in RecordWriter
> ------------------------------------------------------
>
> Key: FLINK-9913
> URL: https://issues.apache.org/jira/browse/FLINK-9913
> Project: Flink
> Issue Type: Improvement
> Components: Network
> Affects Versions: 1.6.0
> Reporter: zhijiang
> Assignee: zhijiang
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.7.0
>
>
> Currently the {{RecordWriter}} emits output into multi channels via
> {{ChannelSelector}} or broadcasts output to all channels directly. Each
> channel has a separate {{RecordSerializer}} for serializing outputs, that
> means the output will be serialized as many times as the number of selected
> channels.
> As we know, data serialization is a high cost operation, so we can get good
> benefits by improving the serialization only once.
> I would suggest the following changes for realizing it.
> # Only one {{RecordSerializer}} is created in {{RecordWriter}} for all the
> channels.
> # The output is serialized into the intermediate data buffer only once for
> different channels.
> # The intermediate serialization results are copied into different
> {{BufferBuilder}}s for different channels.
> An additional benefit by using a single serializer for all channels is that
> we get a potentially significant reduction on heap space overhead from fewer
> intermediate serialization buffers (only once we got over 5MiB, these buffers
> were pruned back to 128B!).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)