Amit, I suppose JB is talking about the RDD based version, so no need
to worry about SparkSession or different incompatible APIs.

Remember the idea we are discussing is to have in master both the
spark 1 and spark 2 runners using the RDD based translation. At the
same time we can have a feature branch to evolve the DataSet based
translator (this one will replace the RDD based translator for spark 2
once it is mature).

The advantages have been already discussed as well as the possible
issues so I think we have to see now if JB's idea is feasible and how
hard would be to live with this while the DataSet version evolves.

I think what we are trying to avoid is to have a long living branch
for a spark 2 runner based on RDD  because the maintenance burden
would be even worse. We would have to fight not only with the double
merge of fixes (in case the profile idea does not work), but also with
the continue evolution of Beam and we would end up in the long living
branch mess that others runners have dealt with (e.g. the Apex runner)

What do you think about this Amit ? Would you be ok to go with it if
JB's profile idea proves to help with the msintenance issues ?


On Wed, Mar 22, 2017 at 5:53 PM, Ted Yu <> wrote:
> hbase-spark module doesn't use SparkSession. So situation there is simpler
> :-)
> On Wed, Mar 22, 2017 at 5:35 AM, Amit Sela <> wrote:
>> I'm still wondering how we'll do this - it's not just different
>> implementations of the same Class, but a completely different concepts such
>> as using SparkSession in Spark 2 instead of SparkContext/StreamingContext
>> in Spark 1.
>> On Tue, Mar 21, 2017 at 7:25 PM Ted Yu <> wrote:
>> > I have done some work over in HBASE-16179 where compatibility modules are
>> > created to isolate changes in Spark 2.x API so that code in hbase-spark
>> > module can be reused.
>> >
>> > FYI
>> >

Reply via email to