+1 for the idea to commit the work earlier. I think we will raise the voting soon. Once it is passed, we can submit the PRs.
What do you think? Anton. On Mon, Nov 1, 2021 at 7:59 AM Wenchen Fan <cloud0...@gmail.com> wrote: > > The general idea looks great. This is indeed a complicated API and we > probably need more time to evaluate the API design. It's better to commit > this work earlier so that we have more time to verify it before the 3.3 > release. Maybe we can commit the group-based API first, then the delta-based > one, as the delta-based API is significantly more convoluted. > > On Thu, Oct 28, 2021 at 12:53 AM L. C. Hsieh <vii...@apache.org> wrote: >> >> >> Thanks for the initial feedback. >> >> I think previously the community is busy on the works related to Spark 3.2 >> release. >> As 3.2 release was done, I'd like to bring this up to the surface again and >> seek for more discussion and feedback. >> >> Thanks. >> >> On 2021/06/25 15:49:49, huaxin gao <huaxin.ga...@gmail.com> wrote: >> > I took a quick look at the PR and it looks like a great feature to have. It >> > provides unified APIs for data sources to perform the commonly used >> > operations easily and efficiently, so users don't have to implement >> > customer extensions on their own. Thanks Anton for the work! >> > >> > On Thu, Jun 24, 2021 at 9:42 PM L. C. Hsieh <vii...@apache.org> wrote: >> > >> > > Thanks Anton. I'm voluntarily to be the shepherd of the SPIP. This is >> > > also >> > > my first time to shepherd a SPIP, so please let me know if anything I can >> > > improve. >> > > >> > > This looks great features and the rationale claimed by the proposal makes >> > > sense. These operations are getting more common and more important in big >> > > data workloads. Instead of building custom extensions by individual data >> > > sources, it makes more sense to support the API from Spark. >> > > >> > > Please provide your thoughts about the proposal and the design. >> > > Appreciate >> > > your feedback. Thank you! >> > > >> > > On 2021/06/24 23:53:32, Anton Okolnychyi <aokolnyc...@gmail.com> wrote: >> > > > Hey everyone, >> > > > >> > > > I'd like to start a discussion on adding support for executing >> > > > row-level >> > > > operations such as DELETE, UPDATE, MERGE for v2 tables (SPARK-35801). >> > > > The >> > > > execution should be the same across data sources and the best way to do >> > > > that is to implement it in Spark. >> > > > >> > > > Right now, Spark can only parse and to some extent analyze DELETE, >> > > UPDATE, >> > > > MERGE commands. Data sources that support row-level changes have to >> > > > build >> > > > custom Spark extensions to execute such statements. The goal of this >> > > effort >> > > > is to come up with a flexible and easy-to-use API that will work across >> > > > data sources. >> > > > >> > > > Design doc: >> > > > >> > > https://docs.google.com/document/d/12Ywmc47j3l2WF4anG5vL4qlrhT2OKigb7_EbIKhxg60/ >> > > > >> > > > PR for handling DELETE statements: >> > > > https://github.com/apache/spark/pull/33008 >> > > > >> > > > Any feedback is more than welcome. >> > > > >> > > > Liang-Chi was kind enough to shepherd this effort. Thanks! >> > > > >> > > > - Anton >> > > > >> > > >> > > --------------------------------------------------------------------- >> > > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> > > >> > > >> > >> >> --------------------------------------------------------------------- >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org