Strictly speaking, data source v2 is always half-finished until we mark it as stable. We need some small milestones to move forward step by step.
The redesign also happens in an incremental way. SPARK-24882 mostly focus on the "RDD" part of the API: the separation of reader factory and input partitions, the introduction of ScanConfig, etc. Then we focus on the high-level abstraction and want to change the "table" part of the API. In my understanding, each PR should be self-contained. If we are OK to have SPARK-24882 in master as an individual commit, I think it's also OK to have it in branch 2.4. I've created https://issues.apache.org/jira/browse/SPARK-25390 to track the new abstraction. It doesn't change the API a lot, but update the streaming execution engine quite a bit. Thanks, Wenchen On Mon, Sep 10, 2018 at 4:20 AM Ryan Blue <rb...@netflix.com> wrote: > Wenchen, can you hold off on the first RC? > > The half-finished changes from the redesign of the DataSourceV2 API are in > master, added in SPARK-24882 <https://github.com/apache/spark/pull/22009>, > and are now in the 2.4 branch. We've had a lot of good discussion since > that PR was merged to update and fix the design, plus only one of the > follow-ups on SPARK-25186 > <https://issues.apache.org/jira/browse/SPARK-25186> is done. Clearly, the > redesign was too large to get into 2.4 in so little time -- it was proposed > about 10 days before the original branch date -- and I don't think it is a > good idea to release half-finished major changes. > > The easiest solution is to revert SPARK-24882 in the release branch. That > way we have minor changes in 2.4 and major changes in the next release, > instead of major changes in both. What does everyone think? > > rb > > On Fri, Sep 7, 2018 at 10:37 AM shane knapp <skn...@berkeley.edu> wrote: > >> ++joshrosen (thanks for the help w/deploying the jenkins configs) >> >> the basic 2.4 builds are deployed and building! >> >> i haven't created (a) build(s) yet for scala 2.12... i'll be >> coordinating this w/the databricks folks next week. >> >> On Fri, Sep 7, 2018 at 9:53 AM, Dongjoon Hyun <dongjoon.h...@gmail.com> >> wrote: >> >>> Thank you, Shane! :D >>> >>> Bests, >>> Dongjoon. >>> >>> On Fri, Sep 7, 2018 at 9:51 AM shane knapp <skn...@berkeley.edu> wrote: >>> >>>> i'll try and get to the 2.4 branch stuff today... >>>> >>>> >> >> >> -- >> Shane Knapp >> UC Berkeley EECS Research / RISELab Staff Technical Lead >> https://rise.cs.berkeley.edu >> > > > -- > Ryan Blue > Software Engineer > Netflix >