Vaclav Plajt created BEAM-6332:
----------------------------------
Summary: Avoid unnecessary serialization steps when executing
`combine` transformation
Key: BEAM-6332
URL: https://issues.apache.org/jira/browse/BEAM-6332
Project: Beam
Issue Type: Improvement
Components: runner-spark
Reporter: Vaclav Plajt
Assignee: Vaclav Plajt
Combine transformation is translated into Spark's RDD API in
[GroupCombineFunctions](https://github.com/apache/beam/blob/master/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/GroupCombineFunctions.java)
`combinePerKey` and `combineGlobally` methods. Both methods use byte arrays as
intermediate state of aggregation so they can be transferred over network. That
leads to serialization and de-serialization of intermediate aggregation value
every time new element is added to aggregation. That is unnecessary and should
be avoided.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)