This is what I did over HBASE-16179: - f.call((asJavaIterator(it), conn)).iterator() + // the return type is different in spark 1.x & 2.x, we handle both cases + f.call(asJavaIterator(it), conn) match { + // spark 1.x + case iterable: Iterable[R] => iterable.iterator() + // spark 2.x + case iterator: Iterator[R] => iterator + } )
FYI On Wed, Mar 29, 2017 at 1:47 AM, Amit Sela <amitsel...@gmail.com> wrote: > Just tried to replace dependencies and see what happens: > > Most required changes are about the runner using deprecated Spark APIs, and > after fixing them the only real issue is with the Java API for > Pair/FlatMapFunction that changed return value to Iterator (in 1.6 its > Iterable). > > So I'm not sure that a profile that simply sets dependency on 1.6.3/2.1.0 > is feasible. > > On Thu, Mar 23, 2017 at 10:22 AM Kobi Salant <kobi.sal...@gmail.com> > wrote: > > > So, if everything is in place in Spark 2.X and we use provided > dependencies > > for Spark in Beam. > > Theoretically, you can run the same code in 2.X without any need for a > > branch? > > > > 2017-03-23 9:47 GMT+02:00 Amit Sela <amitsel...@gmail.com>: > > > > > If StreamingContext is valid and we don't have to use SparkSession, and > > > Accumulators are valid as well and we don't need AccumulatorsV2, I > don't > > > see a reason this shouldn't work (which means there are still tons of > > > reasons this could break, but I can't think of them off the top of my > > head > > > right now). > > > > > > @JB simply add a profile for the Spark dependencies and run the tests - > > > you'll have a very definitive answer ;-) . > > > If this passes, try on a cluster running Spark 2 as well. > > > > > > Let me know of I can assist. > > > > > > On Thu, Mar 23, 2017 at 6:55 AM Jean-Baptiste Onofré <j...@nanthrax.net> > > > wrote: > > > > > > > Hi guys, > > > > > > > > Ismaël summarize well what I have in mind. > > > > > > > > I'm a bit late on the PoC around that (I started a branch already). > > > > I will move forward over the week end. > > > > > > > > Regards > > > > JB > > > > > > > > On 03/22/2017 11:42 PM, Ismaël Mejía wrote: > > > > > Amit, I suppose JB is talking about the RDD based version, so no > need > > > > > to worry about SparkSession or different incompatible APIs. > > > > > > > > > > Remember the idea we are discussing is to have in master both the > > > > > spark 1 and spark 2 runners using the RDD based translation. At the > > > > > same time we can have a feature branch to evolve the DataSet based > > > > > translator (this one will replace the RDD based translator for > spark > > 2 > > > > > once it is mature). > > > > > > > > > > The advantages have been already discussed as well as the possible > > > > > issues so I think we have to see now if JB's idea is feasible and > how > > > > > hard would be to live with this while the DataSet version evolves. > > > > > > > > > > I think what we are trying to avoid is to have a long living branch > > > > > for a spark 2 runner based on RDD because the maintenance burden > > > > > would be even worse. We would have to fight not only with the > double > > > > > merge of fixes (in case the profile idea does not work), but also > > with > > > > > the continue evolution of Beam and we would end up in the long > living > > > > > branch mess that others runners have dealt with (e.g. the Apex > > runner) > > > > > > > > > > > > > > https://lists.apache.org/thread.html/12cc086f5ffe331cc70b89322ce541 > > > 6c3112b87efc3393e3e16032a2@%3Cdev.beam.apache.org%3E > > > > > > > > > > What do you think about this Amit ? Would you be ok to go with it > if > > > > > JB's profile idea proves to help with the msintenance issues ? > > > > > > > > > > Ismaël > > > > > > > > > > > > > > > > > > > > On Wed, Mar 22, 2017 at 5:53 PM, Ted Yu <yuzhih...@gmail.com> > wrote: > > > > >> hbase-spark module doesn't use SparkSession. So situation there is > > > > simpler > > > > >> :-) > > > > >> > > > > >> On Wed, Mar 22, 2017 at 5:35 AM, Amit Sela <amitsel...@gmail.com> > > > > wrote: > > > > >> > > > > >>> I'm still wondering how we'll do this - it's not just different > > > > >>> implementations of the same Class, but a completely different > > > concepts > > > > such > > > > >>> as using SparkSession in Spark 2 instead of > > > > SparkContext/StreamingContext > > > > >>> in Spark 1. > > > > >>> > > > > >>> On Tue, Mar 21, 2017 at 7:25 PM Ted Yu <yuzhih...@gmail.com> > > wrote: > > > > >>> > > > > >>>> I have done some work over in HBASE-16179 where compatibility > > > modules > > > > are > > > > >>>> created to isolate changes in Spark 2.x API so that code in > > > > hbase-spark > > > > >>>> module can be reused. > > > > >>>> > > > > >>>> FYI > > > > >>>> > > > > >>> > > > > > > > > -- > > > > Jean-Baptiste Onofré > > > > jbono...@apache.org > > > > http://blog.nanthrax.net > > > > Talend - http://www.talend.com > > > > > > > > > >