[
https://issues.apache.org/jira/browse/BEAM-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vaclav Plajt updated BEAM-6332:
-------------------------------
Summary: Avoid unnecessary serialization steps when executing combine
transfor (was: Avoid unnecessary serialization steps when executing combine
transformation)
> Avoid unnecessary serialization steps when executing combine transfor
> ---------------------------------------------------------------------
>
> Key: BEAM-6332
> URL: https://issues.apache.org/jira/browse/BEAM-6332
> Project: Beam
> Issue Type: Improvement
> Components: runner-spark
> Reporter: Vaclav Plajt
> Assignee: Vaclav Plajt
> Priority: Major
>
> Combine transformation is translated into Spark's RDD API in
> [GroupCombineFunctions](https://github.com/apache/beam/blob/master/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/GroupCombineFunctions.java)
> `combinePerKey` and `combineGlobally` methods. Both methods use byte arrays
> as intermediate state of aggregation so they can be transferred over network.
> That leads to serialization and de-serialization of intermediate aggregation
> value every time new element is added to aggregation. That is unnecessary and
> should be avoided.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)