Re: [DISCUSS] Implement and release HBASE-24749 (an hfile tracker that allows for avoiding renames)

Duo Zhang Wed, 19 May 2021 18:54:54 -0700

IIRC S3 is the only object storage which does not guarantee
read-after-write consistency in the past...


This is the quick result after googling

AWS [1]

> Amazon S3 delivers strong read-after-write consistency automatically for
> all applications


Azure[2]

> Azure Storage was designed to embrace a strong consistency model that
> guarantees that after the service performs an insert or update operation,
> subsequent read operations return the latest update.


Aliyun[3]

> A feature requires that object operations in OSS be atomic, which
> indicates that operations can only either succeed or fail without
> intermediate states. To ensure that users can access only complete data,
> OSS does not return corrupted or partial data.
>
> Object operations in OSS are highly consistent. For example, when a user
> receives an upload (PUT) success response, the uploaded object can be read
> immediately, and copies of the object are written to multiple devices for
> redundancy. Therefore, the situations where data is not obtained when you
> perform the read-after-write operation do not exist. The same is true for
> delete operations. After you delete an object, the object and its copies no
> longer exist.
>

GCP[4]

> Cloud Storage provides strong global consistency for the following
> operations, including both data and metadata:
>
> Read-after-write
> Read-after-metadata-update
> Read-after-delete
> Bucket listing
> Object listing
>

I think these vendors could cover most end users in the world?

1. https://aws.amazon.com/cn/s3/consistency/
2.
https://docs.microsoft.com/en-us/azure/storage/blobs/concurrency-manage?tabs=dotnet
3. https://www.alibabacloud.com/help/doc-detail/31827.htm
4. https://cloud.google.com/storage/docs/consistency

Nick Dimiduk <[email protected]> 于2021年5月19日周三 下午11:40写道：

> On Wed, May 19, 2021 at 8:19 AM 张铎(Duo Zhang) <[email protected]>
> wrote:
>
> > What about just storing the hfile list in a file? Since now S3 has strong
> > consistency, we could safely overwrite a file then I think?
> >
>
> My concern is about portability. S3 isn't the only blob store in town, and
> consistent read-what-you-wrote semantics are not a standard feature, as far
> as I know. If we want something that can work on 3 or 5 major public cloud
> blobstore products as well as a smattering of on-prem technologies, we
> should be selective about what features we choose to rely on as
> foundational to our implementation.
>
> Or we are explicitly saying this will only work on S3 and we'll only
> support other services when they can achieve this level of compatibility.
>
> Either way, we should be clear and up-front about what semantics we demand.
> Implementing some kind of a test harness that can check compatibility would
> help here, a similar effort to that of defining standard behaviors of HDFS
> implementations.
>
> I love this discussion :)
>
> And since the hfile list file will be very small, renaming will not be a
> > big problem.
> >
>
> Would this be a file per store? A file per region? Ah. Below you imply it's
> per store.
>
> Wellington Chevreuil <[email protected]> 于2021年5月19日周三
> > 下午10:43写道：
> >
> > > Thank you, Andrew and Duo,
> > >
> > > Talking internally with Josh Elser, initial idea was to rebase the
> > feature
> > > branch with master (in order to catch with latest commits), then focus
> on
> > > work to have a minimal functioning hbase, in other words, together with
> > the
> > > already committed work from HBASE-25391, make sure flush, compactions,
> > > splits and merges all can take advantage of the persistent store file
> > > manager and complete with no need to rely on renames. These all map to
> > the
> > > substasks HBASE-25391, HBASE-25392 and HBASE-25393. Once we could test
> > and
> > > validate this works well for our goals, we can then focus on snapshots,
> > > bulkloading and tooling.
> > >
> > > S3 now supports strong consistency, and I heard that they are also
> > > > implementing atomic renaming currently, so maybe that's one of the
> > > reasons
> > > > why the development is silent now..
> > > >
> > > Interesting, I had no idea this was being implemented. I know,
> however, a
> > > version of this feature is already available on latest EMR releases (at
> > > least from 6.2.0), and AWS team has published their own blog post with
> > > their results:
> > >
> > >
> >
> https://aws.amazon.com/blogs/big-data/amazon-emr-6-2-0-adds-persistent-hfile-tracking-to-improve-performance-with-hbase-on-amazon-s3/
> > >
> > > But I do not think store hfile list in meta is the only solution. It
> will
> > > > cause cyclic dependencies for hbase:meta, and then force us a have a
> > > > fallback solution which makes the code a bit ugly. We should try to
> see
> > > if
> > > > this could be done with only the FileSystem.
> > > >
> > > This is indeed a relevant concern. One idea I had mentioned in the
> > original
> > > design doc was to track committed/non-committed files through xattr (or
> > > tags), which may have its own performance issues as explained by
> Stephen
> > > Wu, but is something that could be attempted.
> > >
> > > Em qua., 19 de mai. de 2021 às 04:56, 张铎(Duo Zhang) <
> > [email protected]
> > > >
> > > escreveu:
> > >
> > > > S3 now supports strong consistency, and I heard that they are also
> > > > implementing atomic renaming currently, so maybe that's one of the
> > > reasons
> > > > why the development is silent now...
> > > >
> > > > For me, I also think deploying hbase on cloud storage is the future,
> > so I
> > > > would also like to participate here.
> > > >
> > > > But I do not think store hfile list in meta is the only solution. It
> > will
> > > > cause cyclic dependencies for hbase:meta, and then force us a have a
> > > > fallback solution which makes the code a bit ugly. We should try to
> see
> > > if
> > > > this could be done with only the FileSystem.
> > > >
> > > > Thanks.
> > > >
> > > > Andrew Purtell <[email protected]> 于2021年5月19日周三 上午8:04写道：
> > > >
> > > > > Wellington (and et. al),
> > > > >
> > > > > S3 is also an important piece of our future production plans.
> > > > > Unfortunately,  we were unable to assist much with last year's
> work,
> > on
> > > > > account of being sidetracked by more immediate concerns.
> Fortunately,
> > > > this
> > > > > renewed interest is timely in that we have an HBase 2 project
> where,
> > if
> > > > > this can land in a 2.5 or a 2.6, it could be an important cost to
> > serve
> > > > > optimization, and one we could and would make use of. Therefore I
> > would
> > > > > like to restate my employer's interest in this work too. It may
> just
> > be
> > > > > Viraj and myself in the early days.
> > > > >
> > > > > I'm not sure how best to collaborate. We could review changes from
> > the
> > > > > original authors, new changes, and/or divide up the development
> > tasks.
> > > We
> > > > > can certainly offer our time for testing, and can afford the costs
> of
> > > > > testing against the S3 service.
> > > > >
> > > > >
> > > > > On Tue, May 18, 2021 at 12:16 PM Wellington Chevreuil <
> > > > > [email protected]> wrote:
> > > > >
> > > > > > Greetings everyone,
> > > > > >
> > > > > > HBASE-24749 has been proposed almost a year ago, introducing a
> new
> > > > > > StoreFile tracker as a way to allow for any hbase hfile
> > modifications
> > > > to
> > > > > be
> > > > > > safely completed without needing a file system rename. This seems
> > > > pretty
> > > > > > relevant for deployments over S3 file systems, where rename
> > > operations
> > > > > are
> > > > > > not atomic and can have a performance degradation when multiple
> > > > requests
> > > > > > get concurrently submitted to the same bucket. We had done
> > > superficial
> > > > > > tests and ycsb runs, where individual renames of files larger
> than
> > > 5GB
> > > > > can
> > > > > > take a few hundreds of seconds to complete. We also observed
> > impacts
> > > in
> > > > > > write loads throughput, the bottleneck potentially being the
> > renames.
> > > > > >
> > > > > > With S3 being an important piece of my employer cloud solution,
> we
> > > > would
> > > > > > like to help it move forward. We plan to contribute new patches
> per
> > > the
> > > > > > original design/Jira, but we’d also be happy to review changes
> from
> > > the
> > > > > > original authors, too. Please let us know if anyone has any
> > concerns,
> > > > > > otherwise we’ll start to self-assign issues on HBASE-24749
> > > > > >
> > > > > > Wellington
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best regards,
> > > > > Andrew
> > > > >
> > > > > Words like orphans lost among the crosstalk, meaning torn from
> > truth's
> > > > > decrepit hands
> > > > >    - A23, Crosstalk
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Implement and release HBASE-24749 (an hfile tracker that allows for avoiding renames)

Reply via email to