Jan Lukavský created BEAM-7574:
----------------------------------
Summary: Spark runner: Combine.perKey performance issues
Key: BEAM-7574
URL: https://issues.apache.org/jira/browse/BEAM-7574
Project: Beam
Issue Type: Improvement
Components: runner-spark
Affects Versions: 2.13.0
Reporter: Jan Lukavský
Assignee: Jan Lukavský
Combine.perKey on current implementation uses technique of creating an
accumulator for each input key and then merge all these accumulators together.
Optimize this by:
* changing accumulator from Iterable to Map, and using addInput as much as
possible
* try to move the window explode to pre-shuffle (add window label to key for
non-merging windows), measure the impact, and if the impact is substantial,
implement that for at leasit window functions assigning to single (global)
window or single window per element (tumbling windows)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)