Hi Shammon, After some implementation, I discovered an issue:
replace_branch incurs an expensive IO overhead for most operations in the normal code path. For HDFS, it is a namenode access, and for object storage, it is a separate billing. This is difficult to accept, and if replace_branch is not as useful, I suggest removing this operation. If we remove replace_branch, can we consider changing the name of merge_branch, such as changing it to fast_forward, which seems more appropriate to its original meaning. Best, Jingsong On Fri, Sep 29, 2023 at 2:25 AM Jingsong Li <[email protected]> wrote: > > Thanks Shammon for driving. > > Sounds good to me to start a voting process. > > Best, > Jingsong > > On Mon, Sep 25, 2023 at 7:14 PM Shammon FY <[email protected]> wrote: > > > > Hi all, > > > > Thanks for all the valuable feedback. If there‘s no more comments, I will > > start a vote for this PIP in the next 2 days. > > > > Best, > > Shammon FY > > > > > > On Thu, Sep 21, 2023 at 5:19 PM Shammon FY <[email protected]> wrote: > > > > > The feature `Replace Main With Branch` is used in duplicate data > > > correction without modifying jobs. For example: > > > > > > 1. We can create branches with the same name for a series of paimon tables > > > 2. Re-submit all streaming jobs to read and write these branches for > > > tables > > > 3. After the data in the branch is up to the main, we can stop all the > > > jobs which read and write main branch > > > 4. Replace main branch with the created branch, we don't need to do > > > anything with the jobs read and write the specified branch > > > > > > We cannot `Merge Branch To Main` here because the correct jobs will still > > > read and write the branches which will be completely independent of main. > > > > > > Best, > > > Shammon FY > > > > > > > > > > > > > > > On Thu, Sep 21, 2023 at 12:21 AM Jingsong Li <[email protected]> > > > wrote: > > > > > >> Can you explain more about "Replace Main With Branch"? > > >> > > >> Does this need to be implemented? > > >> > > >> Best, > > >> Jingsong > > >> > > >> On Tue, Sep 19, 2023 at 2:17 PM Shammon FY <[email protected]> wrote: > > >> > > > >> > Hi ConradJam, > > >> > > > >> > How to handle data conflicts between the main branch and branches is a > > >> > complex problem. At present, we would like to replace data in main with > > >> > branch directly. You can think that during merge and replace > > >> > operations, > > >> > the data after the specified tag in the main branch will be deleted and > > >> > then the data after the tag in the branch will be used in the main. > > >> > > > >> > We can consider "merge" conflicting data in the future when we meet > > >> these > > >> > requirements. > > >> > > > >> > Best, > > >> > Shammon FY > > >> > > > >> > On Tue, Sep 19, 2023 at 10:50 AM ConradJam <[email protected]> wrote: > > >> > > > >> > > +1 This feature looks a bit like Git’s branch management.If this is > > >> really > > >> > > the case, how do we solve the data conflict when merging branches? Do > > >> we > > >> > > need the user to specify that a certain branch data shall prevail? > > >> > > > > >> > > Shammon FY <[email protected]> 于2023年9月18日周一 20:06写道: > > >> > > > > >> > > > Hi Jingsong, > > >> > > > > > >> > > > I have updated the PIP-9 to explain that the main `Snapshot`, > > >> `Schema` > > >> > > and > > >> > > > `Tag` will exist in the base directory by default, just as same as > > >> the > > >> > > > current directory structure. Thanks > > >> > > > > > >> > > > Best, > > >> > > > Shammon FY > > >> > > > > > >> > > > > > >> > > > On Fri, Sep 15, 2023 at 10:32 AM Shammon FY <[email protected]> > > >> wrote: > > >> > > > > > >> > > > > Hi Jingsong, > > >> > > > > > > >> > > > > Thanks for your suggestion, it sounds good to me. Currently I > > >> > > > > only > > >> > > > > mentioned it in the `Compatibility` section, I'll update the PIP > > >> to > > >> > > > explain > > >> > > > > this more clearly. > > >> > > > > > > >> > > > > Best, > > >> > > > > Shammon FY > > >> > > > > > > >> > > > > On Wed, Sep 13, 2023 at 12:26 PM Jingsong Li < > > >> [email protected]> > > >> > > > > wrote: > > >> > > > > > > >> > > > >> Thanks Shammon for the proposal! > > >> > > > >> > > >> > > > >> It looks very good! > > >> > > > >> > > >> > > > >> I don't get the main branch file. > > >> > > > >> > > >> > > > >> Can we keep the main branch as it is? Just put snapshot/ tag/ > > >> schema/ > > >> > > > >> in the table root directory. > > >> > > > >> > > >> > > > >> Best, > > >> > > > >> Jingsong > > >> > > > >> > > >> > > > >> On Tue, Sep 12, 2023 at 3:55 PM Shammon FY <[email protected]> > > >> wrote: > > >> > > > >> > > > >> > > > >> > Hi devs, > > >> > > > >> > > > >> > > > >> > I would like to start a discussion about PIP-9: Support Branch > > >> [1]. > > >> > > > >> Branch > > >> > > > >> > in Paimon will help us deal with data correction without > > >> copying all > > >> > > > >> data > > >> > > > >> > from original tables, and it can also enhance Tag for Paimon > > >> like > > >> > > > >> > traditional Hive partition tables, providing data correction > > >> > > > >> capabilities > > >> > > > >> > on the basis of Tag. > > >> > > > >> > > > >> > > > >> > Looking forward to your feedback, thanks! > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > [1] > > >> > > > >> > > > >> > > > >> > > >> > > > > > >> > > > > >> https://cwiki.apache.org/confluence/display/PAIMON/PIP-9%3A+Support+Branch > > >> > > > >> > > > >> > > > >> > Best, > > >> > > > >> > Shammon FY > > >> > > > >> > > >> > > > > > > >> > > > > > >> > > > > >> > > > > >> > > -- > > >> > > Best > > >> > > > > >> > > ConradJam > > >> > > > > >> > > >
