Re: [DISCUSS] SPIP: Row-level operations in Data Source V2

L. C. Hsieh Tue, 02 Nov 2021 09:59:17 -0700

+1 for the idea to commit the work earlier.

I think we will raise the voting soon. Once it is passed, we can submit the PRs.


What do you think? Anton.

On Mon, Nov 1, 2021 at 7:59 AM Wenchen Fan <cloud0...@gmail.com> wrote:
>
> The general idea looks great. This is indeed a complicated API and we 
> probably need more time to evaluate the API design. It's better to commit 
> this work earlier so that we have more time to verify it before the 3.3 
> release. Maybe we can commit the group-based API first, then the delta-based 
> one, as the delta-based API is significantly more convoluted.
>
> On Thu, Oct 28, 2021 at 12:53 AM L. C. Hsieh <vii...@apache.org> wrote:
>>
>>
>> Thanks for the initial feedback.
>>
>> I think previously the community is busy on the works related to Spark 3.2 
>> release.
>> As 3.2 release was done, I'd like to bring this up to the surface again and 
>> seek for more discussion and feedback.
>>
>> Thanks.
>>
>> On 2021/06/25 15:49:49, huaxin gao <huaxin.ga...@gmail.com> wrote:
>> > I took a quick look at the PR and it looks like a great feature to have. It
>> > provides unified APIs for data sources to perform the commonly used
>> > operations easily and efficiently, so users don't have to implement
>> > customer extensions on their own. Thanks Anton for the work!
>> >
>> > On Thu, Jun 24, 2021 at 9:42 PM L. C. Hsieh <vii...@apache.org> wrote:
>> >
>> > > Thanks Anton. I'm voluntarily to be the shepherd of the SPIP. This is 
>> > > also
>> > > my first time to shepherd a SPIP, so please let me know if anything I can
>> > > improve.
>> > >
>> > > This looks great features and the rationale claimed by the proposal makes
>> > > sense. These operations are getting more common and more important in big
>> > > data workloads. Instead of building custom extensions by individual data
>> > > sources, it makes more sense to support the API from Spark.
>> > >
>> > > Please provide your thoughts about the proposal and the design. 
>> > > Appreciate
>> > > your feedback. Thank you!
>> > >
>> > > On 2021/06/24 23:53:32, Anton Okolnychyi <aokolnyc...@gmail.com> wrote:
>> > > > Hey everyone,
>> > > >
>> > > > I'd like to start a discussion on adding support for executing 
>> > > > row-level
>> > > > operations such as DELETE, UPDATE, MERGE for v2 tables (SPARK-35801). 
>> > > > The
>> > > > execution should be the same across data sources and the best way to do
>> > > > that is to implement it in Spark.
>> > > >
>> > > > Right now, Spark can only parse and to some extent analyze DELETE,
>> > > UPDATE,
>> > > > MERGE commands. Data sources that support row-level changes have to 
>> > > > build
>> > > > custom Spark extensions to execute such statements. The goal of this
>> > > effort
>> > > > is to come up with a flexible and easy-to-use API that will work across
>> > > > data sources.
>> > > >
>> > > > Design doc:
>> > > >
>> > > https://docs.google.com/document/d/12Ywmc47j3l2WF4anG5vL4qlrhT2OKigb7_EbIKhxg60/
>> > > >
>> > > > PR for handling DELETE statements:
>> > > > https://github.com/apache/spark/pull/33008
>> > > >
>> > > > Any feedback is more than welcome.
>> > > >
>> > > > Liang-Chi was kind enough to shepherd this effort. Thanks!
>> > > >
>> > > > - Anton
>> > > >
>> > >
>> > > ---------------------------------------------------------------------
>> > > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>> > >
>> > >
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: [DISCUSS] SPIP: Row-level operations in Data Source V2

Reply via email to