I don't know enough about DSv2 to comment on this part, but, any theoretical 2.5 is still a ways off. Does waiting for 3.0 to 'stabilize' it as much as is possible help?
I say that because re: Java 11, the main breaking change is probably the Hive 2 / Hadoop 3 dependency, JPMML (minor), as well as the general classloader changes, handling of off-heap memory. These aren't big breaks, but probably going to break some things. I think we'd want to see a 'proof of concept' branch to evaluate just how much has to change to get it working, and that is why I think a 2.5 release would still need more investigation. On Fri, Sep 20, 2019 at 1:19 PM Ryan Blue <rb...@netflix.com.invalid> wrote: > > DSv2 is far from stable right? > > No, I think it is reasonably stable and very close to being ready for a > release. > > > All the actual data types are unstable and you guys have completely > ignored that. > > I think what you're referring to is the use of `InternalRow`. That's a > stable API and there has been no work to avoid using it. In any case, I > don't think that anyone is suggesting that we delay 3.0 until a replacement > for `InternalRow` is added, right? > > While I understand the motivation for a better solution here, I think the > pragmatic solution is to continue using `InternalRow`. > > > If the goal is to make DSv2 work across 3.x and 2.x, that seems too > invasive of a change to backport once you consider the parts needed to make > dsv2 stable. > > I believe that those of us working on DSv2 are confident about the current > stability. We set goals for what to get into the 3.0 release months ago and > have very nearly reached the point where we are ready for that release. > > I don't think instability would be a problem in maintaining compatibility > between the 2.5 version and the 3.0 version. If we find that we need to > make API changes (other than additions) then we can make those in the 3.1 > release. Because the goals we set for the 3.0 release have been reached > with the current API and if we are ready to release 3.0, we can release a > 2.5 with the same API. > > On Fri, Sep 20, 2019 at 11:05 AM Reynold Xin <r...@databricks.com> wrote: > >> DSv2 is far from stable right? All the actual data types are unstable and >> you guys have completely ignored that. We'd need to work on that and that >> will be a breaking change. If the goal is to make DSv2 work across 3.x and >> 2.x, that seems too invasive of a change to backport once you consider the >> parts needed to make dsv2 stable. >> >> >> >> On Fri, Sep 20, 2019 at 10:47 AM, Ryan Blue <rb...@netflix.com.invalid> >> wrote: >> >>> Hi everyone, >>> >>> In the DSv2 sync this week, we talked about a possible Spark 2.5 release >>> based on the latest Spark 2.4, but with DSv2 and Java 11 support added. >>> >>> A Spark 2.5 release with these two additions will help people migrate to >>> Spark 3.0 when it is released because they will be able to use a single >>> implementation for DSv2 sources that works in both 2.5 and 3.0. Similarly, >>> upgrading to 3.0 won't also require also updating to Java 11 because users >>> could update to Java 11 with the 2.5 release and have fewer major changes. >>> >>> Another reason to consider a 2.5 release is that many people are >>> interested in a release with the latest DSv2 API and support for DSv2 SQL. >>> I'm already going to be backporting DSv2 support to the Spark 2.4 line, so >>> it makes sense to share this work with the community. >>> >>> This release line would just consist of backports like DSv2 and Java 11 >>> that assist compatibility, to keep the scope of the release small. The >>> purpose is to assist people moving to 3.0 and not distract from the 3.0 >>> release. >>> >>> Would a Spark 2.5 release help anyone else? Are there any concerns about >>> this plan? >>> >>> >>> rb >>> >>> >>> -- >>> Ryan Blue >>> Software Engineer >>> Netflix >>> >> >> > > -- > Ryan Blue > Software Engineer > Netflix >