Based solely on the comments made to this thread, I would recommend against
a merge to branch-2, given that we are very close to 2.5. The points about
existing gaps seem like things we're not ready to publish in the impending
minor release. Once we have a branch-2.5, this particular concern of mine
will be alleviated.

Thanks,
Nick

On Wed, Dec 8, 2021 at 1:37 PM Josh Elser <els...@apache.org> wrote:

> I was going to wait for some other folks to chime in, but I guess I can
> be the next one :)
>
> Duo, Wellington, and Szabolcs have been doing some excellent work on the
> storefile tracking (SFT) to a degree that I never expected to see. I
> remember some of the original "Filesystem re-do" issues on Jira. The
> idea was exceptional, but the result seemed unreachable.
>
> These devs, building on the success of what Zach/Stephen first talked
> about in HBASE-24749, came up with what I think is an excellent step
> forward. I've yet to break it via my own testing, but do acknowledge
> that there's always more work to be done.
>
> I think this is at a reasonable place to merge this back into the
> "mainline" branches from the feature branch (HBASE-26067). I believe
> this is ready because:
>
> 1. The feature is completely opt-in (HBase works the same way by default)
> 2. There is API to migrate tables into the new SFT implementation
> 3. There is also API to migrate tables back to the default implementation
>
> Some gaps still exist around bulk loading, documentation, snapshots, and
> recovery tooling, but these are being worked on. In the context of S3,
> this makes a significantly more compelling offering of HBase by removing
> the complexity of HBOSS. For HBase in all installations, I think SFT
> makes more a significantly more "deterministic" way of managing
> regions/files.
>
> +1 from me to merge HBASE-26067 into master and branch-2
>
> - Josh
>
> On 12/7/21 10:31 AM, Wellington Chevreuil wrote:
> > Hello everyone,
> >
> > We have been making progress on the alternative way of tracking store
> files
> > originally proposed by Duo in HBASE-26067.
> >
> > To briefly summarize it for those not following it, this feature
> introduces
> > an abstraction layer to track store files still used/needed by store
> > engines, allowing for plugging different approaches of identifying store
> > files required by the given store. The design doc describing it in more
> > detail is available here
> > <
> https://docs.google.com/document/d/16Nr1Fn3VaXuz1g1FTiME-bnGR3qVK5B-raXshOkDLcY/edit#heading=h.calrs3kn4d8s
> >
> > .
> >
> > Our main goal within this feature is to avoid the need for using temp
> files
> > and renames when creating new hfiles (whenever flushing, compacting,
> > splitting/merging or snapshotting). This is made possible by the
> pluggable
> > tracker implementation labeled "FILE". The current behavior using temp
> dirs
> > and renames would still be the default approach (labeled "DEFAULT").
> >
> > This "renameless" approach is appealing for deployments using Amazon S3
> > Object store file system, where the lack of atomic rename operations
> > imposed the necessity of an additional layer of locking (HBOSS), which
> > combined with the s3a rename operation can have a performance overhead.
> >
> > Some test runs on my employer infrastructure have shown promising
> results.
> > A pure insertion ycsb run has shown ~6% performance gain on the client
> > writes. Snapshot clone of hundreds of regions table completes in half of
> > the time. There are also improvements in compaction, splits and merges
> > times.
> >
> > Talking with Duo Zhang and Josh Elser in the HBASE-26067 jira, we feel
> > optimistic that the current implementation is in a good state to get
> merged
> > into master branch, but it would be nice to hear other opinions about it,
> > before we effectively commit it. Looking forward to hearing some
> > thoughts/concerns you might have.
> >
> > Kind regards,
> > Wellington.
> >
>

Reply via email to