Spark 2.1.1 has been released. Consider using the new release in this work.
Thanks On Wed, Mar 29, 2017 at 5:43 AM, Jean-Baptiste Onofré <j...@nanthrax.net> wrote: > Cool for the PR merge, I will rebase my branch on it. > > Thanks ! > Regards > JB > > > On 03/29/2017 01:58 PM, Amit Sela wrote: > >> @Ted definitely makes sense. >> @JB I'm merging https://github.com/apache/beam/pull/2354 soon so any >> deprecated Spark API issues should be resolved. >> >> On Wed, Mar 29, 2017 at 2:46 PM Ted Yu <yuzhih...@gmail.com> wrote: >> >> This is what I did over HBASE-16179: >>> >>> - f.call((asJavaIterator(it), conn)).iterator() >>> + // the return type is different in spark 1.x & 2.x, we handle >>> both >>> cases >>> + f.call(asJavaIterator(it), conn) match { >>> + // spark 1.x >>> + case iterable: Iterable[R] => iterable.iterator() >>> + // spark 2.x >>> + case iterator: Iterator[R] => iterator >>> + } >>> ) >>> >>> FYI >>> >>> On Wed, Mar 29, 2017 at 1:47 AM, Amit Sela <amitsel...@gmail.com> wrote: >>> >>> Just tried to replace dependencies and see what happens: >>>> >>>> Most required changes are about the runner using deprecated Spark APIs, >>>> >>> and >>> >>>> after fixing them the only real issue is with the Java API for >>>> Pair/FlatMapFunction that changed return value to Iterator (in 1.6 its >>>> Iterable). >>>> >>>> So I'm not sure that a profile that simply sets dependency on >>>> 1.6.3/2.1.0 >>>> is feasible. >>>> >>>> On Thu, Mar 23, 2017 at 10:22 AM Kobi Salant <kobi.sal...@gmail.com> >>>> wrote: >>>> >>>> So, if everything is in place in Spark 2.X and we use provided >>>>> >>>> dependencies >>>> >>>>> for Spark in Beam. >>>>> Theoretically, you can run the same code in 2.X without any need for a >>>>> branch? >>>>> >>>>> 2017-03-23 9:47 GMT+02:00 Amit Sela <amitsel...@gmail.com>: >>>>> >>>>> If StreamingContext is valid and we don't have to use SparkSession, >>>>>> >>>>> and >>> >>>> Accumulators are valid as well and we don't need AccumulatorsV2, I >>>>>> >>>>> don't >>>> >>>>> see a reason this shouldn't work (which means there are still tons of >>>>>> reasons this could break, but I can't think of them off the top of my >>>>>> >>>>> head >>>>> >>>>>> right now). >>>>>> >>>>>> @JB simply add a profile for the Spark dependencies and run the >>>>>> >>>>> tests - >>> >>>> you'll have a very definitive answer ;-) . >>>>>> If this passes, try on a cluster running Spark 2 as well. >>>>>> >>>>>> Let me know of I can assist. >>>>>> >>>>>> On Thu, Mar 23, 2017 at 6:55 AM Jean-Baptiste Onofré < >>>>>> >>>>> j...@nanthrax.net> >>> >>>> wrote: >>>>>> >>>>>> Hi guys, >>>>>>> >>>>>>> Ismaël summarize well what I have in mind. >>>>>>> >>>>>>> I'm a bit late on the PoC around that (I started a branch already). >>>>>>> I will move forward over the week end. >>>>>>> >>>>>>> Regards >>>>>>> JB >>>>>>> >>>>>>> On 03/22/2017 11:42 PM, Ismaël Mejía wrote: >>>>>>> >>>>>>>> Amit, I suppose JB is talking about the RDD based version, so no >>>>>>>> >>>>>>> need >>>> >>>>> to worry about SparkSession or different incompatible APIs. >>>>>>>> >>>>>>>> Remember the idea we are discussing is to have in master both the >>>>>>>> spark 1 and spark 2 runners using the RDD based translation. At >>>>>>>> >>>>>>> the >>> >>>> same time we can have a feature branch to evolve the DataSet >>>>>>>> >>>>>>> based >>> >>>> translator (this one will replace the RDD based translator for >>>>>>>> >>>>>>> spark >>>> >>>>> 2 >>>>> >>>>>> once it is mature). >>>>>>>> >>>>>>>> The advantages have been already discussed as well as the >>>>>>>> >>>>>>> possible >>> >>>> issues so I think we have to see now if JB's idea is feasible and >>>>>>>> >>>>>>> how >>>> >>>>> hard would be to live with this while the DataSet version >>>>>>>> >>>>>>> evolves. >>> >>>> >>>>>>>> I think what we are trying to avoid is to have a long living >>>>>>>> >>>>>>> branch >>> >>>> for a spark 2 runner based on RDD because the maintenance burden >>>>>>>> would be even worse. We would have to fight not only with the >>>>>>>> >>>>>>> double >>>> >>>>> merge of fixes (in case the profile idea does not work), but also >>>>>>>> >>>>>>> with >>>>> >>>>>> the continue evolution of Beam and we would end up in the long >>>>>>>> >>>>>>> living >>>> >>>>> branch mess that others runners have dealt with (e.g. the Apex >>>>>>>> >>>>>>> runner) >>>>> >>>>>> >>>>>>>> >>>>>>>> >>>>>>> https://lists.apache.org/thread.html/12cc086f5ffe331cc70b89322ce541 >>> >>>> 6c3112b87efc3393e3e16032a2@%3Cdev.beam.apache.org%3E >>>>>> >>>>>>> >>>>>>>> What do you think about this Amit ? Would you be ok to go with it >>>>>>>> >>>>>>> if >>>> >>>>> JB's profile idea proves to help with the msintenance issues ? >>>>>>>> >>>>>>>> Ismaël >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Mar 22, 2017 at 5:53 PM, Ted Yu <yuzhih...@gmail.com> >>>>>>>> >>>>>>> wrote: >>>> >>>>> hbase-spark module doesn't use SparkSession. So situation there >>>>>>>>> >>>>>>>> is >>> >>>> simpler >>>>>>> >>>>>>>> :-) >>>>>>>>> >>>>>>>>> On Wed, Mar 22, 2017 at 5:35 AM, Amit Sela < >>>>>>>>> >>>>>>>> amitsel...@gmail.com> >>> >>>> wrote: >>>>>>> >>>>>>>> >>>>>>>>> I'm still wondering how we'll do this - it's not just different >>>>>>>>>> implementations of the same Class, but a completely different >>>>>>>>>> >>>>>>>>> concepts >>>>>> >>>>>>> such >>>>>>> >>>>>>>> as using SparkSession in Spark 2 instead of >>>>>>>>>> >>>>>>>>> SparkContext/StreamingContext >>>>>>> >>>>>>>> in Spark 1. >>>>>>>>>> >>>>>>>>>> On Tue, Mar 21, 2017 at 7:25 PM Ted Yu <yuzhih...@gmail.com> >>>>>>>>>> >>>>>>>>> wrote: >>>>> >>>>>> >>>>>>>>>> I have done some work over in HBASE-16179 where compatibility >>>>>>>>>>> >>>>>>>>>> modules >>>>>> >>>>>>> are >>>>>>> >>>>>>>> created to isolate changes in Spark 2.x API so that code in >>>>>>>>>>> >>>>>>>>>> hbase-spark >>>>>>> >>>>>>>> module can be reused. >>>>>>>>>>> >>>>>>>>>>> FYI >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>> -- >>>>>>> Jean-Baptiste Onofré >>>>>>> jbono...@apache.org >>>>>>> http://blog.nanthrax.net >>>>>>> Talend - http://www.talend.com >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> > -- > Jean-Baptiste Onofré > jbono...@apache.org > http://blog.nanthrax.net > Talend - http://www.talend.com >