[
https://issues.apache.org/jira/browse/BEAM-7647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Etienne Chauchot updated BEAM-7647:
-----------------------------------
Description:
In combine globally:
{code:java}
Iterable<WindowedValue<OutputT>> output =
sparkCombineFn.extractOutput(maybeAccumulated.get());
outRdd =
context
.getSparkContext()
.parallelize(CoderHelpers.toByteArrays(output, wvoCoder))
.map(CoderHelpers.fromByteFunction(wvoCoder));
{code}
=> risk of OOM in the list, get data to a single worker (the driver)
was:
In combine globally:
{code:java}
Iterable<WindowedValue<OutputT>> output =
sparkCombineFn.extractOutput(maybeAccumulated.get());
outRdd =
context
.getSparkContext()
.parallelize(CoderHelpers.toByteArrays(output, wvoCoder))
.map(CoderHelpers.fromByteFunction(wvoCoder));
{code}
=> risk of OOM in the list, shuffle data to a single worker (the driver)
> CombineGlobally translation is risky and not very performant.
> -------------------------------------------------------------
>
> Key: BEAM-7647
> URL: https://issues.apache.org/jira/browse/BEAM-7647
> Project: Beam
> Issue Type: Improvement
> Components: runner-spark
> Reporter: Etienne Chauchot
> Priority: Major
>
> In combine globally:
> {code:java}
> Iterable<WindowedValue<OutputT>> output =
> sparkCombineFn.extractOutput(maybeAccumulated.get());
> outRdd =
> context
> .getSparkContext()
> .parallelize(CoderHelpers.toByteArrays(output, wvoCoder))
> .map(CoderHelpers.fromByteFunction(wvoCoder));
> {code}
> => risk of OOM in the list, get data to a single worker (the driver)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)