IIRC S3 is the only object storage which does not guarantee read-after-write consistency in the past...
This is the quick result after googling AWS [1] > Amazon S3 delivers strong read-after-write consistency automatically for > all applications Azure[2] > Azure Storage was designed to embrace a strong consistency model that > guarantees that after the service performs an insert or update operation, > subsequent read operations return the latest update. Aliyun[3] > A feature requires that object operations in OSS be atomic, which > indicates that operations can only either succeed or fail without > intermediate states. To ensure that users can access only complete data, > OSS does not return corrupted or partial data. > > Object operations in OSS are highly consistent. For example, when a user > receives an upload (PUT) success response, the uploaded object can be read > immediately, and copies of the object are written to multiple devices for > redundancy. Therefore, the situations where data is not obtained when you > perform the read-after-write operation do not exist. The same is true for > delete operations. After you delete an object, the object and its copies no > longer exist. > GCP[4] > Cloud Storage provides strong global consistency for the following > operations, including both data and metadata: > > Read-after-write > Read-after-metadata-update > Read-after-delete > Bucket listing > Object listing > I think these vendors could cover most end users in the world? 1. https://aws.amazon.com/cn/s3/consistency/ 2. https://docs.microsoft.com/en-us/azure/storage/blobs/concurrency-manage?tabs=dotnet 3. https://www.alibabacloud.com/help/doc-detail/31827.htm 4. https://cloud.google.com/storage/docs/consistency Nick Dimiduk <[email protected]> 于2021年5月19日周三 下午11:40写道: > On Wed, May 19, 2021 at 8:19 AM 张铎(Duo Zhang) <[email protected]> > wrote: > > > What about just storing the hfile list in a file? Since now S3 has strong > > consistency, we could safely overwrite a file then I think? > > > > My concern is about portability. S3 isn't the only blob store in town, and > consistent read-what-you-wrote semantics are not a standard feature, as far > as I know. If we want something that can work on 3 or 5 major public cloud > blobstore products as well as a smattering of on-prem technologies, we > should be selective about what features we choose to rely on as > foundational to our implementation. > > Or we are explicitly saying this will only work on S3 and we'll only > support other services when they can achieve this level of compatibility. > > Either way, we should be clear and up-front about what semantics we demand. > Implementing some kind of a test harness that can check compatibility would > help here, a similar effort to that of defining standard behaviors of HDFS > implementations. > > I love this discussion :) > > And since the hfile list file will be very small, renaming will not be a > > big problem. > > > > Would this be a file per store? A file per region? Ah. Below you imply it's > per store. > > Wellington Chevreuil <[email protected]> 于2021年5月19日周三 > > 下午10:43写道: > > > > > Thank you, Andrew and Duo, > > > > > > Talking internally with Josh Elser, initial idea was to rebase the > > feature > > > branch with master (in order to catch with latest commits), then focus > on > > > work to have a minimal functioning hbase, in other words, together with > > the > > > already committed work from HBASE-25391, make sure flush, compactions, > > > splits and merges all can take advantage of the persistent store file > > > manager and complete with no need to rely on renames. These all map to > > the > > > substasks HBASE-25391, HBASE-25392 and HBASE-25393. Once we could test > > and > > > validate this works well for our goals, we can then focus on snapshots, > > > bulkloading and tooling. > > > > > > S3 now supports strong consistency, and I heard that they are also > > > > implementing atomic renaming currently, so maybe that's one of the > > > reasons > > > > why the development is silent now.. > > > > > > > Interesting, I had no idea this was being implemented. I know, > however, a > > > version of this feature is already available on latest EMR releases (at > > > least from 6.2.0), and AWS team has published their own blog post with > > > their results: > > > > > > > > > https://aws.amazon.com/blogs/big-data/amazon-emr-6-2-0-adds-persistent-hfile-tracking-to-improve-performance-with-hbase-on-amazon-s3/ > > > > > > But I do not think store hfile list in meta is the only solution. It > will > > > > cause cyclic dependencies for hbase:meta, and then force us a have a > > > > fallback solution which makes the code a bit ugly. We should try to > see > > > if > > > > this could be done with only the FileSystem. > > > > > > > This is indeed a relevant concern. One idea I had mentioned in the > > original > > > design doc was to track committed/non-committed files through xattr (or > > > tags), which may have its own performance issues as explained by > Stephen > > > Wu, but is something that could be attempted. > > > > > > Em qua., 19 de mai. de 2021 às 04:56, 张铎(Duo Zhang) < > > [email protected] > > > > > > > escreveu: > > > > > > > S3 now supports strong consistency, and I heard that they are also > > > > implementing atomic renaming currently, so maybe that's one of the > > > reasons > > > > why the development is silent now... > > > > > > > > For me, I also think deploying hbase on cloud storage is the future, > > so I > > > > would also like to participate here. > > > > > > > > But I do not think store hfile list in meta is the only solution. It > > will > > > > cause cyclic dependencies for hbase:meta, and then force us a have a > > > > fallback solution which makes the code a bit ugly. We should try to > see > > > if > > > > this could be done with only the FileSystem. > > > > > > > > Thanks. > > > > > > > > Andrew Purtell <[email protected]> 于2021年5月19日周三 上午8:04写道: > > > > > > > > > Wellington (and et. al), > > > > > > > > > > S3 is also an important piece of our future production plans. > > > > > Unfortunately, we were unable to assist much with last year's > work, > > on > > > > > account of being sidetracked by more immediate concerns. > Fortunately, > > > > this > > > > > renewed interest is timely in that we have an HBase 2 project > where, > > if > > > > > this can land in a 2.5 or a 2.6, it could be an important cost to > > serve > > > > > optimization, and one we could and would make use of. Therefore I > > would > > > > > like to restate my employer's interest in this work too. It may > just > > be > > > > > Viraj and myself in the early days. > > > > > > > > > > I'm not sure how best to collaborate. We could review changes from > > the > > > > > original authors, new changes, and/or divide up the development > > tasks. > > > We > > > > > can certainly offer our time for testing, and can afford the costs > of > > > > > testing against the S3 service. > > > > > > > > > > > > > > > On Tue, May 18, 2021 at 12:16 PM Wellington Chevreuil < > > > > > [email protected]> wrote: > > > > > > > > > > > Greetings everyone, > > > > > > > > > > > > HBASE-24749 has been proposed almost a year ago, introducing a > new > > > > > > StoreFile tracker as a way to allow for any hbase hfile > > modifications > > > > to > > > > > be > > > > > > safely completed without needing a file system rename. This seems > > > > pretty > > > > > > relevant for deployments over S3 file systems, where rename > > > operations > > > > > are > > > > > > not atomic and can have a performance degradation when multiple > > > > requests > > > > > > get concurrently submitted to the same bucket. We had done > > > superficial > > > > > > tests and ycsb runs, where individual renames of files larger > than > > > 5GB > > > > > can > > > > > > take a few hundreds of seconds to complete. We also observed > > impacts > > > in > > > > > > write loads throughput, the bottleneck potentially being the > > renames. > > > > > > > > > > > > With S3 being an important piece of my employer cloud solution, > we > > > > would > > > > > > like to help it move forward. We plan to contribute new patches > per > > > the > > > > > > original design/Jira, but we’d also be happy to review changes > from > > > the > > > > > > original authors, too. Please let us know if anyone has any > > concerns, > > > > > > otherwise we’ll start to self-assign issues on HBASE-24749 > > > > > > > > > > > > Wellington > > > > > > > > > > > > > > > > > > > > > -- > > > > > Best regards, > > > > > Andrew > > > > > > > > > > Words like orphans lost among the crosstalk, meaning torn from > > truth's > > > > > decrepit hands > > > > > - A23, Crosstalk > > > > > > > > > > > > > > >
