Hi I did a new runner supporting spark 2.1.x. I changed code for that.
I'm still in vacation this week. I will send an update when back. Regards JB On Aug 21, 2017, 09:01, at 09:01, Pei HE <pei...@gmail.com> wrote: >Any updates for upgrading to spark 2.x? > >I tried to replace the dependency and found a compile error from >implementing a scala trait: >org.apache.beam.runners.spark.io.SourceRDD.SourcePartition is not >abstract >and does not override abstract method >org$apache$spark$Partition$$super$equals(java.lang.Object) in >org.apache.spark.Partition > >(The spark side change was introduced in >https://github.com/apache/spark/pull/12157.) > >Does anyone have ideas about this compile error? > > >On Wed, May 3, 2017 at 1:32 PM, Jean-Baptiste Onofré <j...@nanthrax.net> >wrote: > >> Hi Ted, >> >> My branch used Spark 2.1.0 and I just updated to 2.1.1. >> >> As discussed with Aviem, I should be able to create the pull request >later >> today. >> >> Regards >> JB >> >> >> On 05/03/2017 02:50 AM, Ted Yu wrote: >> >>> Spark 2.1.1 has been released. >>> >>> Consider using the new release in this work. >>> >>> Thanks >>> >>> On Wed, Mar 29, 2017 at 5:43 AM, Jean-Baptiste Onofré ><j...@nanthrax.net> >>> wrote: >>> >>> Cool for the PR merge, I will rebase my branch on it. >>>> >>>> Thanks ! >>>> Regards >>>> JB >>>> >>>> >>>> On 03/29/2017 01:58 PM, Amit Sela wrote: >>>> >>>> @Ted definitely makes sense. >>>>> @JB I'm merging https://github.com/apache/beam/pull/2354 soon so >any >>>>> deprecated Spark API issues should be resolved. >>>>> >>>>> On Wed, Mar 29, 2017 at 2:46 PM Ted Yu <yuzhih...@gmail.com> >wrote: >>>>> >>>>> This is what I did over HBASE-16179: >>>>> >>>>>> >>>>>> - f.call((asJavaIterator(it), conn)).iterator() >>>>>> + // the return type is different in spark 1.x & 2.x, we >handle >>>>>> both >>>>>> cases >>>>>> + f.call(asJavaIterator(it), conn) match { >>>>>> + // spark 1.x >>>>>> + case iterable: Iterable[R] => iterable.iterator() >>>>>> + // spark 2.x >>>>>> + case iterator: Iterator[R] => iterator >>>>>> + } >>>>>> ) >>>>>> >>>>>> FYI >>>>>> >>>>>> On Wed, Mar 29, 2017 at 1:47 AM, Amit Sela <amitsel...@gmail.com> >>>>>> wrote: >>>>>> >>>>>> Just tried to replace dependencies and see what happens: >>>>>> >>>>>>> >>>>>>> Most required changes are about the runner using deprecated >Spark >>>>>>> APIs, >>>>>>> >>>>>>> and >>>>>> >>>>>> after fixing them the only real issue is with the Java API for >>>>>>> Pair/FlatMapFunction that changed return value to Iterator (in >1.6 its >>>>>>> Iterable). >>>>>>> >>>>>>> So I'm not sure that a profile that simply sets dependency on >>>>>>> 1.6.3/2.1.0 >>>>>>> is feasible. >>>>>>> >>>>>>> On Thu, Mar 23, 2017 at 10:22 AM Kobi Salant ><kobi.sal...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>> So, if everything is in place in Spark 2.X and we use provided >>>>>>> >>>>>>>> >>>>>>>> dependencies >>>>>>> >>>>>>> for Spark in Beam. >>>>>>>> Theoretically, you can run the same code in 2.X without any >need for >>>>>>>> a >>>>>>>> branch? >>>>>>>> >>>>>>>> 2017-03-23 9:47 GMT+02:00 Amit Sela <amitsel...@gmail.com>: >>>>>>>> >>>>>>>> If StreamingContext is valid and we don't have to use >SparkSession, >>>>>>>> >>>>>>>>> >>>>>>>>> and >>>>>>>> >>>>>>> >>>>>> Accumulators are valid as well and we don't need AccumulatorsV2, >I >>>>>>> >>>>>>>> >>>>>>>>> don't >>>>>>>> >>>>>>> >>>>>>> see a reason this shouldn't work (which means there are still >tons of >>>>>>>> >>>>>>>>> reasons this could break, but I can't think of them off the >top of >>>>>>>>> my >>>>>>>>> >>>>>>>>> head >>>>>>>> >>>>>>>> right now). >>>>>>>>> >>>>>>>>> @JB simply add a profile for the Spark dependencies and run >the >>>>>>>>> >>>>>>>>> tests - >>>>>>>> >>>>>>> >>>>>> you'll have a very definitive answer ;-) . >>>>>>> >>>>>>>> If this passes, try on a cluster running Spark 2 as well. >>>>>>>>> >>>>>>>>> Let me know of I can assist. >>>>>>>>> >>>>>>>>> On Thu, Mar 23, 2017 at 6:55 AM Jean-Baptiste Onofré < >>>>>>>>> >>>>>>>>> j...@nanthrax.net> >>>>>>>> >>>>>>> >>>>>> wrote: >>>>>>> >>>>>>>> >>>>>>>>> Hi guys, >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Ismaël summarize well what I have in mind. >>>>>>>>>> >>>>>>>>>> I'm a bit late on the PoC around that (I started a branch >already). >>>>>>>>>> I will move forward over the week end. >>>>>>>>>> >>>>>>>>>> Regards >>>>>>>>>> JB >>>>>>>>>> >>>>>>>>>> On 03/22/2017 11:42 PM, Ismaël Mejía wrote: >>>>>>>>>> >>>>>>>>>> Amit, I suppose JB is talking about the RDD based version, so >no >>>>>>>>>>> >>>>>>>>>>> need >>>>>>>>>> >>>>>>>>> >>>>>>> to worry about SparkSession or different incompatible APIs. >>>>>>>> >>>>>>>>> >>>>>>>>>>> Remember the idea we are discussing is to have in master >both the >>>>>>>>>>> spark 1 and spark 2 runners using the RDD based translation. >At >>>>>>>>>>> >>>>>>>>>>> the >>>>>>>>>> >>>>>>>>> >>>>>> same time we can have a feature branch to evolve the DataSet >>>>>>> >>>>>>>> >>>>>>>>>>> based >>>>>>>>>> >>>>>>>>> >>>>>> translator (this one will replace the RDD based translator for >>>>>>> >>>>>>>> >>>>>>>>>>> spark >>>>>>>>>> >>>>>>>>> >>>>>>> 2 >>>>>>>> >>>>>>>> once it is mature). >>>>>>>>> >>>>>>>>>> >>>>>>>>>>> The advantages have been already discussed as well as the >>>>>>>>>>> >>>>>>>>>>> possible >>>>>>>>>> >>>>>>>>> >>>>>> issues so I think we have to see now if JB's idea is feasible and >>>>>>> >>>>>>>> >>>>>>>>>>> how >>>>>>>>>> >>>>>>>>> >>>>>>> hard would be to live with this while the DataSet version >>>>>>>> >>>>>>>>> >>>>>>>>>>> evolves. >>>>>>>>>> >>>>>>>>> >>>>>> >>>>>>> I think what we are trying to avoid is to have a long living >>>>>>>>>>> >>>>>>>>>>> branch >>>>>>>>>> >>>>>>>>> >>>>>> for a spark 2 runner based on RDD because the maintenance burden >>>>>>> >>>>>>>> would be even worse. We would have to fight not only with the >>>>>>>>>>> >>>>>>>>>>> double >>>>>>>>>> >>>>>>>>> >>>>>>> merge of fixes (in case the profile idea does not work), but >also >>>>>>>> >>>>>>>>> >>>>>>>>>>> with >>>>>>>>>> >>>>>>>>> >>>>>>>> the continue evolution of Beam and we would end up in the long >>>>>>>>> >>>>>>>>>> >>>>>>>>>>> living >>>>>>>>>> >>>>>>>>> >>>>>>> branch mess that others runners have dealt with (e.g. the Apex >>>>>>>> >>>>>>>>> >>>>>>>>>>> runner) >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> https://lists.apache.org/thread.html/12cc086f5ffe331cc70b893 >>>>>>>>>> 22ce541 >>>>>>>>>> >>>>>>>>> >>>>>> 6c3112b87efc3393e3e16032a2@%3Cdev.beam.apache.org%3E >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>> What do you think about this Amit ? Would you be ok to go >with it >>>>>>>>>>> >>>>>>>>>>> if >>>>>>>>>> >>>>>>>>> >>>>>>> JB's profile idea proves to help with the msintenance issues ? >>>>>>>> >>>>>>>>> >>>>>>>>>>> Ismaël >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Wed, Mar 22, 2017 at 5:53 PM, Ted Yu ><yuzhih...@gmail.com> >>>>>>>>>>> >>>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>> >>>>>>> hbase-spark module doesn't use SparkSession. So situation there >>>>>>>> >>>>>>>>> >>>>>>>>>>>> is >>>>>>>>>>> >>>>>>>>>> >>>>>> simpler >>>>>>> >>>>>>>> >>>>>>>>>> :-) >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Mar 22, 2017 at 5:35 AM, Amit Sela < >>>>>>>>>>>> >>>>>>>>>>>> amitsel...@gmail.com> >>>>>>>>>>> >>>>>>>>>> >>>>>> wrote: >>>>>>> >>>>>>>> >>>>>>>>>> >>>>>>>>>>> I'm still wondering how we'll do this - it's not just >different >>>>>>>>>>>> >>>>>>>>>>>>> implementations of the same Class, but a completely >different >>>>>>>>>>>>> >>>>>>>>>>>>> concepts >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> such >>>>>>>>>> >>>>>>>>>> as using SparkSession in Spark 2 instead of >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> SparkContext/StreamingContext >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> in Spark 1. >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> On Tue, Mar 21, 2017 at 7:25 PM Ted Yu ><yuzhih...@gmail.com> >>>>>>>>>>>>> >>>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>> >>>>>>>>> I have done some work over in HBASE-16179 where compatibility >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> modules >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>> are >>>>>>>>>> >>>>>>>>>> created to isolate changes in Spark 2.x API so that code in >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>> hbase-spark >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> module can be reused. >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>> FYI >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>> Jean-Baptiste Onofré >>>>>>>>>> jbono...@apache.org >>>>>>>>>> http://blog.nanthrax.net >>>>>>>>>> Talend - http://www.talend.com >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> -- >>>> Jean-Baptiste Onofré >>>> jbono...@apache.org >>>> http://blog.nanthrax.net >>>> Talend - http://www.talend.com >>>> >>>> >>> >> -- >> Jean-Baptiste Onofré >> jbono...@apache.org >> http://blog.nanthrax.net >> Talend - http://www.talend.com >>