+1 on not creating a branch now. Rebasing and maintenance are too expensive, comparing of fast development. Some additional thoughts below.
Row-delete feature should be behind a feature flag, which implies that it should have minimum impact on Master branch if it is turned off. Working on Master avoids the pain of breaking Master at branch merge, which actually works at a fail-fast and fail-early mode. Working on Master Branch will not prevent splitting the feature into small items. Instead, it will encourage more people to work it and help the community stay focus on Master roadmap. Finally, if we think about rebasing, it either ends too expensive to rebase or easy to rebase. If it is the former case, we should not create a branch because it is hard to keep sync with Master. If it is the latter case, it has little impact on Master and there is no need to have a branch. Thanks! Miao From: Ryan Blue <rb...@netflix.com.INVALID> Reply-To: "dev@iceberg.apache.org" <dev@iceberg.apache.org>, "rb...@netflix.com" <rb...@netflix.com> Date: Tuesday, March 31, 2020 at 10:08 AM To: OpenInx <open...@gmail.com> Cc: Iceberg Dev List <dev@iceberg.apache.org> Subject: Re: Open a new branch for row-delete feature ? I'm fine starting a branch later if we do run into those issues, but I don't think it is a good idea to do it now in anticipation. All of the work that we can do on master we should try to do on master. We can start a branch when we need one. On Mon, Mar 30, 2020 at 7:44 PM OpenInx <open...@gmail.com<mailto:open...@gmail.com>> wrote: Hi Ryan The reason I suggest to open a new dev branch for row-delete development is: we will split the whole feature into many small issues and each issue will have a pull request with appropriate length of code so the contributors/reviewers can discuss one point each time and make this feature a faster iteration. In the process of implementation, we will ensure that the v1 works for every separate PR but it may not ready for cutting release, for example, when release the 0.8.0 I'm sure we won't like the release version contains part of the v2 spec(such as provide the sequence_number, but no file_type). The spark reader/writer and data/delete manifest may also need some code refactor, it's possible to put them into several PR. Splitting into multiple Pull Requests may block the release of the new version for a certain period of time, that's not we want to see. About the new branch maintenance, in my experience we could rebase the new branch with master periodly(such as rebase for every three days), so that the new pull request for row-delete will be designed based on the newest changes. It should work for the master which would not have too many new change. This is in line with our current situation. In this case, I weighed the maintenance costs of the new branch against the delay of the row-delete. I think we should let the row-delete go a little faster (almost all community users are looking forward to this feature), and I think the current maintenance cost is acceptable. Thanks On Tue, Mar 31, 2020 at 5:52 AM Ryan Blue <rb...@netflix.com.invalid> wrote: Sorry, I didn't address the suggestion to add a Flink branch as well. The work needed for the Flink sink is to remove parts that are specific to Netflix, so I'm not sure what the rationale for a branch would be. Is there a reason why this can't be done in master, but requires a shared branch? If multiple people want to contribute, why not contribute to the same PR? A shared PR branch makes the most sense to me for this because it is regularly tested against master. On Mon, Mar 30, 2020 at 2:48 PM Ryan Blue <rb...@netflix.com<mailto:rb...@netflix.com>> wrote: I think we will eventually may want a branch, but I think it is too early to create one now. Branches are expensive. They require maintenance to stay in sync with master, usually copying changes from master into the branch with updates. Updating the changes to master for the branch is more difficult because it is usually not the original contributor or reviewer porting them. And it is better to catch problems between changes in master and the branch early. I'm not against branches, but I don't want to create them unless they are valuable. In this case, I don't see the value. We plan to add v2 in parallel so you can still write v1 tables for compatibility, and most of the work that needs to be done -- like creating readers and writers for diff formats -- can be done in master. rb On Mon, Mar 30, 2020 at 9:00 AM Gautam <gautamkows...@gmail.com<mailto:gautamkows...@gmail.com>> wrote: Thanks for bringing this up OpenInx. That's a great idea: to open a separate branch for row-level deletes. I would like to help support/contribute/review this as well. If there are sub-tasks you guys have identified that can be added to https://github.com/apache/incubator-iceberg/milestone/4<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-iceberg%2Fmilestone%2F4&data=02%7C01%7Cmiwang%40adobe.com%7C4fe7bf8f64704b177d7708d7d59614f0%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637212712887717814&sdata=d%2BWSh2N5PBTVFP%2B9AZEwbg8ElpjoCpxbYLn8z2BSwSI%3D&reserved=0> we can start taking those up too. thanks for the good work, - Gautam. On Mon, Mar 30, 2020 at 8:39 AM Junjie Chen <chenjunjied...@gmail.com<mailto:chenjunjied...@gmail.com>> wrote: +1 to create the branch. Some row-level delete subtasks must be based on the sequence number as well as end to end tests. On Fri, Mar 27, 2020 at 4:42 PM OpenInx <open...@gmail.com<mailto:open...@gmail.com>> wrote: Dear Dev: Tuesday, we had a sync meeting. and discussed about the things: 1. cut the 0.8.0 release; 2. flink connector ; 3. iceberg row-level delete; 4. Map-Reduce Formats and Hive support. We'll release version 0.8.0 around April 15, the following 0.9.0 will be released in the next few month. On the other hand, Ryan, Junjie Chen and I have done three PoC versions for the row-level deletes. We had a full discussion[4] and started to do the relevant code design. we're sure that the feature will introduce some incompatible specification, such as the sequence_number spec[1], file_type spec[2], the sortedOrder feature seems also to be a breaking change [3]. To avoid affecting the release of version 0.8.0 and push the row-delete feature early. I suggest to open a new branch for the row-delete feature, name it branch-1. Once the row-delete feature is stable, we could release the 1.0.0. Or we can just open a row-delete feature branch and once the work is done we will merge the row-delete feature branch back to master branch, and continue to release the 0.9.0 version. I guess the flink connector dev are facing the same problem ? What do you think about this ? Thank you. [1]. https://github.com/apache/incubator-iceberg/pull/588<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-iceberg%2Fpull%2F588&data=02%7C01%7Cmiwang%40adobe.com%7C4fe7bf8f64704b177d7708d7d59614f0%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637212712887717814&sdata=P7N%2Bdwc3qejjdC%2F%2F5qU0eFn12ejmo0xG0kOfHBlRJPs%3D&reserved=0> [2]. https://github.com/apache/incubator-iceberg/issues/824<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-iceberg%2Fissues%2F824&data=02%7C01%7Cmiwang%40adobe.com%7C4fe7bf8f64704b177d7708d7d59614f0%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637212712887727808&sdata=xotQnZYPCRUiw1cM83obHXhmp%2FePuwxH%2BDRW8fldJPA%3D&reserved=0> [3]. https://github.com/apache/incubator-iceberg/issues/317<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-iceberg%2Fissues%2F317&data=02%7C01%7Cmiwang%40adobe.com%7C4fe7bf8f64704b177d7708d7d59614f0%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637212712887727808&sdata=5MqRFWfWJyM7XOyPBrnCYKDjDUWXw5nFTQV%2BN3znwMc%3D&reserved=0> [4]. https://docs.google.com/document/d/1CPFun2uG-eXdJggqKcPsTdNa2wPMpAdw8loeP-0fm_M/edit?usp=sharing<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.google.com%2Fdocument%2Fd%2F1CPFun2uG-eXdJggqKcPsTdNa2wPMpAdw8loeP-0fm_M%2Fedit%3Fusp%3Dsharing&data=02%7C01%7Cmiwang%40adobe.com%7C4fe7bf8f64704b177d7708d7d59614f0%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637212712887737807&sdata=qrFOHp2Ri3Q3mrHaOImcnDQWyopWDQEnkNtoWyxe3ME%3D&reserved=0> -- Best Regards -- Ryan Blue Software Engineer Netflix -- Ryan Blue Software Engineer Netflix -- Ryan Blue Software Engineer Netflix