Etienne Chauchot created BEAM-7647:
--------------------------------------
Summary: CombineGlobally translation is risky and not very
performant.
Key: BEAM-7647
URL: https://issues.apache.org/jira/browse/BEAM-7647
Project: Beam
Issue Type: Improvement
Components: runner-spark
Reporter: Etienne Chauchot
In combine globally:
{code:java}
Iterable<WindowedValue<OutputT>> output =
sparkCombineFn.extractOutput(maybeAccumulated.get());
outRdd =
context
.getSparkContext()
.parallelize(CoderHelpers.toByteArrays(output, wvoCoder))
.map(CoderHelpers.fromByteFunction(wvoCoder));
{code}
=> risk of OOM in the list, shuffle data to a single worker (the driver)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)