Hi Jean, Thanks for your response. So when can we expect Spark 2.x support for spark runner?
Thanks, Nishu On Mon, Nov 13, 2017 at 11:53 AM, Jean-Baptiste Onofré <[email protected]> wrote: > Hi, > > Regarding your question: > > 1. Not yet, but as you might have seen on the mailing list, we have a PR > about Spark 2.x support. > > 2. We have additional triggers supported and in progress. GroupByKey and > Accumator are also supported. > > 3. No, I did the change to both allows you to define the default storage > level (via the pipeline options). The runner also automatically define when > to persist a RDD by analyzing the DAG. > > 4. Yes, it's supported. > > Regards > JB > > On 11/13/2017 10:50 AM, Nishu wrote: > >> Hi Team, >> >> I am writing a streaming pipeline in Apache beam using spark runner. >> Use case : To join the multiple kafka streams using windowed collections. >> I use GroupByKey to group the events based on common business key and that >> output is used as input for Join operation. Pipeline run on direct runner >> as expected but on Spark cluster(v2.1), it throws the Accumulator error. >> *"Exception in thread "main" java.lang.AssertionError: assertion failed: >> copyAndReset must return a zero value copy"* >> >> I tried the same pipeline on Spark cluster(v1.6), there it runs without >> any >> error but doesn't perform the join operations on the streams . >> >> I got couple of questions. >> >> 1. Does spark runner support spark version 2.x? >> >> 2. Regarding the triggers, currently only ProcessingTimeTrigger is >> supported in Capability Matrix >> <https://beam.apache.org/documentation/runners/capability- >> matrix/#cap-summary-what> >> , >> can we expect to have support for more trigger in near future sometime >> soon >> ? Also, GroupByKey and Accumulating panes features, are those supported >> for >> spark for streaming pipeline? >> >> 3. According to the documentation, Storage level >> <https://beam.apache.org/documentation/runners/spark/#pipeli >> ne-options-for-the-spark-runner> >> is set to IN_MEMORY for streaming pipelines. Can we configure it to disk >> as >> well? >> >> 4. Is there checkpointing feature supported for Spark runner? In case if >> Beam pipeline fails unexpectedly, can we read the state from the last run. >> >> It will be great if someone could help to know above. >> >> > -- > Jean-Baptiste Onofré > [email protected] > http://blog.nanthrax.net > Talend - http://www.talend.com > -- Thanks & Regards, Nishu Tayal
