HBASE-24749 design and implementation had acknowledged compromises on
review: e.g. adding a new 'system table' to hold store files.  I'd suggest
the design and implementation need a revisit before we go forward; for
instance, factoring for systems other than s3 as suggested above (I like
the Duo list).

S

On Wed, May 19, 2021 at 8:19 AM 张铎(Duo Zhang) <[email protected]> wrote:

> What about just storing the hfile list in a file? Since now S3 has strong
> consistency, we could safely overwrite a file then I think?
>
> And since the hfile list file will be very small, renaming will not be a
> big problem.
>
> We could write the hfile list to a file called 'hfile.list.tmp', and then
> rename it to 'hfile.list'.
>
> This is safe for HDFS, and for S3, since it is not atomic, maybe we could
> face that, the 'hfile.list' file is not there, but there is a
> 'hfile.list.tmp'.
>
> So when opening a HStore, we first check if 'hfile.list' is there, if not,
> try 'hfile.list.tmp', rename it and load it. For safety, we could write an
> initial hfile list file with no hfiles. So if we can not load either
> 'hfile.list' or 'hfile.list.tmp', then we know something is wrong so users
> should try to fix  it with HBCK.
> And in HBCK, we will do a listing and generate the 'hfile.list' file.
>
> WDYT?
>
> Thanks.
>
> Wellington Chevreuil <[email protected]> 于2021年5月19日周三
> 下午10:43写道:
>
> > Thank you, Andrew and Duo,
> >
> > Talking internally with Josh Elser, initial idea was to rebase the
> feature
> > branch with master (in order to catch with latest commits), then focus on
> > work to have a minimal functioning hbase, in other words, together with
> the
> > already committed work from HBASE-25391, make sure flush, compactions,
> > splits and merges all can take advantage of the persistent store file
> > manager and complete with no need to rely on renames. These all map to
> the
> > substasks HBASE-25391, HBASE-25392 and HBASE-25393. Once we could test
> and
> > validate this works well for our goals, we can then focus on snapshots,
> > bulkloading and tooling.
> >
> > S3 now supports strong consistency, and I heard that they are also
> > > implementing atomic renaming currently, so maybe that's one of the
> > reasons
> > > why the development is silent now..
> > >
> > Interesting, I had no idea this was being implemented. I know, however, a
> > version of this feature is already available on latest EMR releases (at
> > least from 6.2.0), and AWS team has published their own blog post with
> > their results:
> >
> >
> https://aws.amazon.com/blogs/big-data/amazon-emr-6-2-0-adds-persistent-hfile-tracking-to-improve-performance-with-hbase-on-amazon-s3/
> >
> > But I do not think store hfile list in meta is the only solution. It will
> > > cause cyclic dependencies for hbase:meta, and then force us a have a
> > > fallback solution which makes the code a bit ugly. We should try to see
> > if
> > > this could be done with only the FileSystem.
> > >
> > This is indeed a relevant concern. One idea I had mentioned in the
> original
> > design doc was to track committed/non-committed files through xattr (or
> > tags), which may have its own performance issues as explained by Stephen
> > Wu, but is something that could be attempted.
> >
> > Em qua., 19 de mai. de 2021 às 04:56, 张铎(Duo Zhang) <
> [email protected]
> > >
> > escreveu:
> >
> > > S3 now supports strong consistency, and I heard that they are also
> > > implementing atomic renaming currently, so maybe that's one of the
> > reasons
> > > why the development is silent now...
> > >
> > > For me, I also think deploying hbase on cloud storage is the future,
> so I
> > > would also like to participate here.
> > >
> > > But I do not think store hfile list in meta is the only solution. It
> will
> > > cause cyclic dependencies for hbase:meta, and then force us a have a
> > > fallback solution which makes the code a bit ugly. We should try to see
> > if
> > > this could be done with only the FileSystem.
> > >
> > > Thanks.
> > >
> > > Andrew Purtell <[email protected]> 于2021年5月19日周三 上午8:04写道:
> > >
> > > > Wellington (and et. al),
> > > >
> > > > S3 is also an important piece of our future production plans.
> > > > Unfortunately,  we were unable to assist much with last year's work,
> on
> > > > account of being sidetracked by more immediate concerns. Fortunately,
> > > this
> > > > renewed interest is timely in that we have an HBase 2 project where,
> if
> > > > this can land in a 2.5 or a 2.6, it could be an important cost to
> serve
> > > > optimization, and one we could and would make use of. Therefore I
> would
> > > > like to restate my employer's interest in this work too. It may just
> be
> > > > Viraj and myself in the early days.
> > > >
> > > > I'm not sure how best to collaborate. We could review changes from
> the
> > > > original authors, new changes, and/or divide up the development
> tasks.
> > We
> > > > can certainly offer our time for testing, and can afford the costs of
> > > > testing against the S3 service.
> > > >
> > > >
> > > > On Tue, May 18, 2021 at 12:16 PM Wellington Chevreuil <
> > > > [email protected]> wrote:
> > > >
> > > > > Greetings everyone,
> > > > >
> > > > > HBASE-24749 has been proposed almost a year ago, introducing a new
> > > > > StoreFile tracker as a way to allow for any hbase hfile
> modifications
> > > to
> > > > be
> > > > > safely completed without needing a file system rename. This seems
> > > pretty
> > > > > relevant for deployments over S3 file systems, where rename
> > operations
> > > > are
> > > > > not atomic and can have a performance degradation when multiple
> > > requests
> > > > > get concurrently submitted to the same bucket. We had done
> > superficial
> > > > > tests and ycsb runs, where individual renames of files larger than
> > 5GB
> > > > can
> > > > > take a few hundreds of seconds to complete. We also observed
> impacts
> > in
> > > > > write loads throughput, the bottleneck potentially being the
> renames.
> > > > >
> > > > > With S3 being an important piece of my employer cloud solution, we
> > > would
> > > > > like to help it move forward. We plan to contribute new patches per
> > the
> > > > > original design/Jira, but we’d also be happy to review changes from
> > the
> > > > > original authors, too. Please let us know if anyone has any
> concerns,
> > > > > otherwise we’ll start to self-assign issues on HBASE-24749
> > > > >
> > > > > Wellington
> > > > >
> > > >
> > > >
> > > > --
> > > > Best regards,
> > > > Andrew
> > > >
> > > > Words like orphans lost among the crosstalk, meaning torn from
> truth's
> > > > decrepit hands
> > > >    - A23, Crosstalk
> > > >
> > >
> >
>

Reply via email to