Etienne Chauchot created BEAM-7647:
--------------------------------------

             Summary: CombineGlobally translation is risky and not very 
performant.
                 Key: BEAM-7647
                 URL: https://issues.apache.org/jira/browse/BEAM-7647
             Project: Beam
          Issue Type: Improvement
          Components: runner-spark
            Reporter: Etienne Chauchot


In combine globally:
{code:java}
Iterable<WindowedValue<OutputT>> output =
              sparkCombineFn.extractOutput(maybeAccumulated.get());
          outRdd =
              context
                  .getSparkContext()
                  .parallelize(CoderHelpers.toByteArrays(output, wvoCoder))
                  .map(CoderHelpers.fromByteFunction(wvoCoder));
{code}
=> risk of OOM in the list, shuffle data to a single worker (the driver)





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to