[ https://issues.apache.org/jira/browse/FLINK-9913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Nico Kruber updated FLINK-9913: ------------------------------- Description: Currently the {{RecordWriter}} emits output into multi channels via {{ChannelSelector}} or broadcasts output to all channels directly. Each channel has a separate {{RecordSerializer}} for serializing outputs, that means the output will be serialized as many times as the number of selected channels. As we know, data serialization is a high cost operation, so we can get good benefits by improving the serialization only once. I would suggest the following changes for realizing it. # Only one {{RecordSerializer}} is created in {{RecordWriter}} for all the channels. # The output is serialized into the intermediate data buffer only once for different channels. # The intermediate serialization results are copied into different {{BufferBuilder}}s for different channels. An additional benefit by using a single serializer for all channels is that we get a potentially significant reduction on heap space overhead from fewer intermediate serialization buffers (only once we got over 5MiB, these buffers were pruned back to 128B!). was: Currently the {{RecordWriter}} emits output into multi channels via {{ChannelSelector}} or broadcasts output to all channels directly. Each channel has a separate {{RecordSerializer}} for serializing outputs, that means the output will be serialized as many times as the number of selected channels. As we know, data serialization is a high cost operation, so we can get good benefits by improving the serialization only once. I would suggest the following changes for realizing it. # Only one {{RecordSerializer}} is created in {{RecordWriter}} for all the channels. # The output is serialized into the intermediate data buffer only once for different channels. # The intermediate serialization results are copied into different {{BufferBuilder}}s for different channels. > Improve output serialization only once in RecordWriter > ------------------------------------------------------ > > Key: FLINK-9913 > URL: https://issues.apache.org/jira/browse/FLINK-9913 > Project: Flink > Issue Type: Improvement > Components: Network > Affects Versions: 1.6.0 > Reporter: zhijiang > Assignee: zhijiang > Priority: Major > Labels: pull-request-available > Fix For: 1.7.0 > > > Currently the {{RecordWriter}} emits output into multi channels via > {{ChannelSelector}} or broadcasts output to all channels directly. Each > channel has a separate {{RecordSerializer}} for serializing outputs, that > means the output will be serialized as many times as the number of selected > channels. > As we know, data serialization is a high cost operation, so we can get good > benefits by improving the serialization only once. > I would suggest the following changes for realizing it. > # Only one {{RecordSerializer}} is created in {{RecordWriter}} for all the > channels. > # The output is serialized into the intermediate data buffer only once for > different channels. > # The intermediate serialization results are copied into different > {{BufferBuilder}}s for different channels. > An additional benefit by using a single serializer for all channels is that > we get a potentially significant reduction on heap space overhead from fewer > intermediate serialization buffers (only once we got over 5MiB, these buffers > were pruned back to 128B!). -- This message was sent by Atlassian JIRA (v7.6.3#76005)