Hi Shammon,

After some implementation, I discovered an issue:

replace_branch incurs an expensive IO overhead for most operations in
the normal code path. For HDFS, it is a namenode access, and for
object storage, it is a separate billing.

This is difficult to accept, and if replace_branch is not as useful, I
suggest removing this operation.

If we remove replace_branch, can we consider changing the name of
merge_branch, such as changing it to fast_forward, which seems more
appropriate to its original meaning.

Best,
Jingsong

On Fri, Sep 29, 2023 at 2:25 AM Jingsong Li <[email protected]> wrote:
>
> Thanks Shammon for driving.
>
> Sounds good to me to start a voting process.
>
> Best,
> Jingsong
>
> On Mon, Sep 25, 2023 at 7:14 PM Shammon FY <[email protected]> wrote:
> >
> > Hi all,
> >
> > Thanks for all the valuable feedback. If there‘s no more comments, I will
> > start a vote for this PIP in the next 2 days.
> >
> > Best,
> > Shammon FY
> >
> >
> > On Thu, Sep 21, 2023 at 5:19 PM Shammon FY <[email protected]> wrote:
> >
> > > The feature `Replace Main With Branch` is used in duplicate data
> > > correction without modifying jobs. For example:
> > >
> > > 1. We can create branches with the same name for a series of paimon tables
> > > 2. Re-submit all streaming jobs to read and write these branches for 
> > > tables
> > > 3. After the data in the branch is up to the main, we can stop all the
> > > jobs which read and write main branch
> > > 4. Replace main branch with the created branch, we don't need to do
> > > anything with the jobs read and write the specified branch
> > >
> > > We cannot `Merge Branch To Main` here because the correct jobs will still
> > > read and write the branches which will be completely independent of main.
> > >
> > > Best,
> > > Shammon FY
> > >
> > >
> > >
> > >
> > > On Thu, Sep 21, 2023 at 12:21 AM Jingsong Li <[email protected]>
> > > wrote:
> > >
> > >> Can you explain more about "Replace Main With Branch"?
> > >>
> > >> Does this need to be implemented?
> > >>
> > >> Best,
> > >> Jingsong
> > >>
> > >> On Tue, Sep 19, 2023 at 2:17 PM Shammon FY <[email protected]> wrote:
> > >> >
> > >> > Hi ConradJam,
> > >> >
> > >> > How to handle data conflicts between the main branch and branches is a
> > >> > complex problem. At present, we would like to replace data in main with
> > >> > branch directly. You can think that during merge and replace 
> > >> > operations,
> > >> > the data after the specified tag in the main branch will be deleted and
> > >> > then the data after the tag in the branch will be used in the main.
> > >> >
> > >> > We can consider  "merge" conflicting data in the future when we meet
> > >> these
> > >> > requirements.
> > >> >
> > >> > Best,
> > >> > Shammon FY
> > >> >
> > >> > On Tue, Sep 19, 2023 at 10:50 AM ConradJam <[email protected]> wrote:
> > >> >
> > >> > > +1 This feature looks a bit like Git’s branch management.If this is
> > >> really
> > >> > > the case, how do we solve the data conflict when merging branches? Do
> > >> we
> > >> > > need the user to specify that a certain branch data shall prevail?
> > >> > >
> > >> > > Shammon FY <[email protected]> 于2023年9月18日周一 20:06写道:
> > >> > >
> > >> > > > Hi Jingsong,
> > >> > > >
> > >> > > > I have updated the PIP-9 to explain that the main `Snapshot`,
> > >> `Schema`
> > >> > > and
> > >> > > > `Tag` will exist in the base directory by default, just as same as
> > >> the
> > >> > > > current directory structure. Thanks
> > >> > > >
> > >> > > > Best,
> > >> > > > Shammon FY
> > >> > > >
> > >> > > >
> > >> > > > On Fri, Sep 15, 2023 at 10:32 AM Shammon FY <[email protected]>
> > >> wrote:
> > >> > > >
> > >> > > > > Hi Jingsong,
> > >> > > > >
> > >> > > > > Thanks for your suggestion, it sounds good to me. Currently I 
> > >> > > > > only
> > >> > > > > mentioned it in the `Compatibility` section, I'll update the PIP
> > >> to
> > >> > > > explain
> > >> > > > > this more clearly.
> > >> > > > >
> > >> > > > > Best,
> > >> > > > > Shammon FY
> > >> > > > >
> > >> > > > > On Wed, Sep 13, 2023 at 12:26 PM Jingsong Li <
> > >> [email protected]>
> > >> > > > > wrote:
> > >> > > > >
> > >> > > > >> Thanks Shammon for the proposal!
> > >> > > > >>
> > >> > > > >> It looks very good!
> > >> > > > >>
> > >> > > > >> I don't get the main branch file.
> > >> > > > >>
> > >> > > > >> Can we keep the main branch as it is? Just put snapshot/ tag/
> > >> schema/
> > >> > > > >> in the table root directory.
> > >> > > > >>
> > >> > > > >> Best,
> > >> > > > >> Jingsong
> > >> > > > >>
> > >> > > > >> On Tue, Sep 12, 2023 at 3:55 PM Shammon FY <[email protected]>
> > >> wrote:
> > >> > > > >> >
> > >> > > > >> > Hi devs,
> > >> > > > >> >
> > >> > > > >> > I would like to start a discussion about PIP-9: Support Branch
> > >> [1].
> > >> > > > >> Branch
> > >> > > > >> > in Paimon will help us deal with data correction without
> > >> copying all
> > >> > > > >> data
> > >> > > > >> > from original tables, and it can also enhance Tag for Paimon
> > >> like
> > >> > > > >> > traditional Hive partition tables, providing data correction
> > >> > > > >> capabilities
> > >> > > > >> > on the basis of Tag.
> > >> > > > >> >
> > >> > > > >> > Looking forward to your feedback, thanks!
> > >> > > > >> >
> > >> > > > >> >
> > >> > > > >> > [1]
> > >> > > > >> >
> > >> > > > >>
> > >> > > >
> > >> > >
> > >> https://cwiki.apache.org/confluence/display/PAIMON/PIP-9%3A+Support+Branch
> > >> > > > >> >
> > >> > > > >> > Best,
> > >> > > > >> > Shammon FY
> > >> > > > >>
> > >> > > > >
> > >> > > >
> > >> > >
> > >> > >
> > >> > > --
> > >> > > Best
> > >> > >
> > >> > > ConradJam
> > >> > >
> > >>
> > >

Reply via email to