+1 on not creating a branch now. Rebasing and maintenance are too expensive, 
comparing of fast development. Some additional thoughts below.

Row-delete feature should be behind a feature flag, which implies that it 
should have minimum impact on Master branch if it is turned off. Working on 
Master avoids the pain of breaking Master at branch merge, which actually works 
at a fail-fast and fail-early mode.

Working on Master Branch will not prevent splitting the feature into small 
items. Instead, it will encourage more people to work it and help the community 
stay focus on Master roadmap.

Finally, if we think about rebasing, it either ends too expensive to rebase or 
easy to rebase. If it is the former case, we should not create a branch because 
it is hard to keep sync with Master. If it is the latter case, it has little 
impact on Master and there is no need to have a branch.

Thanks!

Miao

From: Ryan Blue <rb...@netflix.com.INVALID>
Reply-To: "dev@iceberg.apache.org" <dev@iceberg.apache.org>, 
"rb...@netflix.com" <rb...@netflix.com>
Date: Tuesday, March 31, 2020 at 10:08 AM
To: OpenInx <open...@gmail.com>
Cc: Iceberg Dev List <dev@iceberg.apache.org>
Subject: Re: Open a new branch for row-delete feature ?

I'm fine starting a branch later if we do run into those issues, but I don't 
think it is a good idea to do it now in anticipation. All of the work that we 
can do on master we should try to do on master. We can start a branch when we 
need one.

On Mon, Mar 30, 2020 at 7:44 PM OpenInx 
<open...@gmail.com<mailto:open...@gmail.com>> wrote:
Hi Ryan

The reason I suggest to open a new dev branch for row-delete development is:  
we will split the whole feature into
many small issues and each issue will have a pull request with appropriate 
length of code so the contributors/reviewers
can discuss one point each time and make this feature a faster iteration. In 
the process of implementation, we will ensure
that the v1 works for every separate PR but it may not ready for cutting 
release, for example, when release the 0.8.0 I'm
sure we won't like the release version contains part of the v2 spec(such as 
provide the sequence_number, but no file_type).
The spark reader/writer and data/delete manifest may also need some code 
refactor, it's possible to put them into several PR.
Splitting into multiple Pull Requests may block the release of the new version 
for a certain period of time, that's not we want
to see.

About the new branch maintenance, in my experience we could rebase the new 
branch with master periodly(such as rebase
for every three days), so that the new pull request for row-delete will be 
designed based on the newest changes. It should work
for the master which would not have too many new change. This is in line with 
our current situation.

In this case, I weighed the maintenance costs of the new branch against the 
delay of the row-delete. I think we should let the
row-delete go a little faster (almost all community users are looking forward 
to this feature), and I think the current maintenance
cost is acceptable.

Thanks

On Tue, Mar 31, 2020 at 5:52 AM Ryan Blue <rb...@netflix.com.invalid> wrote:
Sorry, I didn't address the suggestion to add a Flink branch as well. The work 
needed for the Flink sink is to remove parts that are specific to Netflix, so 
I'm not sure what the rationale for a branch would be. Is there a reason why 
this can't be done in master, but requires a shared branch? If multiple people 
want to contribute, why not contribute to the same PR?

A shared PR branch makes the most sense to me for this because it is regularly 
tested against master.

On Mon, Mar 30, 2020 at 2:48 PM Ryan Blue 
<rb...@netflix.com<mailto:rb...@netflix.com>> wrote:
I think we will eventually may want a branch, but I think it is too early to 
create one now.

Branches are expensive. They require maintenance to stay in sync with master, 
usually copying changes from master into the branch with updates. Updating the 
changes to master for the branch is more difficult because it is usually not 
the original contributor or reviewer porting them. And it is better to catch 
problems between changes in master and the branch early.

I'm not against branches, but I don't want to create them unless they are 
valuable. In this case, I don't see the value. We plan to add v2 in parallel so 
you can still write v1 tables for compatibility, and most of the work that 
needs to be done -- like creating readers and writers for diff formats -- can 
be done in master.

rb

On Mon, Mar 30, 2020 at 9:00 AM Gautam 
<gautamkows...@gmail.com<mailto:gautamkows...@gmail.com>> wrote:
Thanks for bringing this up OpenInx.  That's a great idea: to open a separate 
branch for row-level deletes.

I would like to help support/contribute/review this as well. If there are 
sub-tasks you guys have identified that can be added to 
https://github.com/apache/incubator-iceberg/milestone/4<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-iceberg%2Fmilestone%2F4&data=02%7C01%7Cmiwang%40adobe.com%7C4fe7bf8f64704b177d7708d7d59614f0%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637212712887717814&sdata=d%2BWSh2N5PBTVFP%2B9AZEwbg8ElpjoCpxbYLn8z2BSwSI%3D&reserved=0>
 we can start taking those up too.

thanks for the good work,
- Gautam.



On Mon, Mar 30, 2020 at 8:39 AM Junjie Chen 
<chenjunjied...@gmail.com<mailto:chenjunjied...@gmail.com>> wrote:
+1 to create the branch. Some row-level delete subtasks must be based on the 
sequence number as well as end to end tests.

On Fri, Mar 27, 2020 at 4:42 PM OpenInx 
<open...@gmail.com<mailto:open...@gmail.com>> wrote:
Dear Dev:

     Tuesday, we had a sync meeting. and discussed about the things:
         1.  cut the 0.8.0 release;
         2.  flink connector ;
         3.  iceberg row-level delete;
         4. Map-Reduce Formats and Hive support.

      We'll release version 0.8.0 around April 15, the following 0.9.0 will be
     released in the next few month. On the other hand, Ryan, Junjie Chen
     and I have done three PoC versions for the row-level deletes. We had
     a full discussion[4] and started to do the relevant code design. we're 
sure that
     the feature will introduce some incompatible specification,  such as the
     sequence_number spec[1], file_type spec[2], the sortedOrder feature seems
     also to be a breaking change [3].

     To avoid affecting the release of version 0.8.0 and push the row-delete 
feature
     early. I suggest to open a new branch for the row-delete feature, name it 
branch-1.
     Once the row-delete feature is stable, we could release the 1.0.0. Or we 
can just
     open a row-delete feature branch and once the work is done we will merge
     the row-delete feature branch back to master branch, and continue to 
release the 0.9.0
     version.

     I guess the flink connector dev are facing the same problem ?

     What do you think about this ?

     Thank you.


  [1]. 
https://github.com/apache/incubator-iceberg/pull/588<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-iceberg%2Fpull%2F588&data=02%7C01%7Cmiwang%40adobe.com%7C4fe7bf8f64704b177d7708d7d59614f0%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637212712887717814&sdata=P7N%2Bdwc3qejjdC%2F%2F5qU0eFn12ejmo0xG0kOfHBlRJPs%3D&reserved=0>
  [2]. 
https://github.com/apache/incubator-iceberg/issues/824<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-iceberg%2Fissues%2F824&data=02%7C01%7Cmiwang%40adobe.com%7C4fe7bf8f64704b177d7708d7d59614f0%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637212712887727808&sdata=xotQnZYPCRUiw1cM83obHXhmp%2FePuwxH%2BDRW8fldJPA%3D&reserved=0>
  [3]. 
https://github.com/apache/incubator-iceberg/issues/317<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-iceberg%2Fissues%2F317&data=02%7C01%7Cmiwang%40adobe.com%7C4fe7bf8f64704b177d7708d7d59614f0%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637212712887727808&sdata=5MqRFWfWJyM7XOyPBrnCYKDjDUWXw5nFTQV%2BN3znwMc%3D&reserved=0>
  [4]. 
https://docs.google.com/document/d/1CPFun2uG-eXdJggqKcPsTdNa2wPMpAdw8loeP-0fm_M/edit?usp=sharing<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.google.com%2Fdocument%2Fd%2F1CPFun2uG-eXdJggqKcPsTdNa2wPMpAdw8loeP-0fm_M%2Fedit%3Fusp%3Dsharing&data=02%7C01%7Cmiwang%40adobe.com%7C4fe7bf8f64704b177d7708d7d59614f0%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637212712887737807&sdata=qrFOHp2Ri3Q3mrHaOImcnDQWyopWDQEnkNtoWyxe3ME%3D&reserved=0>



--
Best Regards


--
Ryan Blue
Software Engineer
Netflix


--
Ryan Blue
Software Engineer
Netflix


--
Ryan Blue
Software Engineer
Netflix

Reply via email to