Thank you, Andrew and Duo, Talking internally with Josh Elser, initial idea was to rebase the feature branch with master (in order to catch with latest commits), then focus on work to have a minimal functioning hbase, in other words, together with the already committed work from HBASE-25391, make sure flush, compactions, splits and merges all can take advantage of the persistent store file manager and complete with no need to rely on renames. These all map to the substasks HBASE-25391, HBASE-25392 and HBASE-25393. Once we could test and validate this works well for our goals, we can then focus on snapshots, bulkloading and tooling.
S3 now supports strong consistency, and I heard that they are also > implementing atomic renaming currently, so maybe that's one of the reasons > why the development is silent now.. > Interesting, I had no idea this was being implemented. I know, however, a version of this feature is already available on latest EMR releases (at least from 6.2.0), and AWS team has published their own blog post with their results: https://aws.amazon.com/blogs/big-data/amazon-emr-6-2-0-adds-persistent-hfile-tracking-to-improve-performance-with-hbase-on-amazon-s3/ But I do not think store hfile list in meta is the only solution. It will > cause cyclic dependencies for hbase:meta, and then force us a have a > fallback solution which makes the code a bit ugly. We should try to see if > this could be done with only the FileSystem. > This is indeed a relevant concern. One idea I had mentioned in the original design doc was to track committed/non-committed files through xattr (or tags), which may have its own performance issues as explained by Stephen Wu, but is something that could be attempted. Em qua., 19 de mai. de 2021 às 04:56, 张铎(Duo Zhang) <[email protected]> escreveu: > S3 now supports strong consistency, and I heard that they are also > implementing atomic renaming currently, so maybe that's one of the reasons > why the development is silent now... > > For me, I also think deploying hbase on cloud storage is the future, so I > would also like to participate here. > > But I do not think store hfile list in meta is the only solution. It will > cause cyclic dependencies for hbase:meta, and then force us a have a > fallback solution which makes the code a bit ugly. We should try to see if > this could be done with only the FileSystem. > > Thanks. > > Andrew Purtell <[email protected]> 于2021年5月19日周三 上午8:04写道: > > > Wellington (and et. al), > > > > S3 is also an important piece of our future production plans. > > Unfortunately, we were unable to assist much with last year's work, on > > account of being sidetracked by more immediate concerns. Fortunately, > this > > renewed interest is timely in that we have an HBase 2 project where, if > > this can land in a 2.5 or a 2.6, it could be an important cost to serve > > optimization, and one we could and would make use of. Therefore I would > > like to restate my employer's interest in this work too. It may just be > > Viraj and myself in the early days. > > > > I'm not sure how best to collaborate. We could review changes from the > > original authors, new changes, and/or divide up the development tasks. We > > can certainly offer our time for testing, and can afford the costs of > > testing against the S3 service. > > > > > > On Tue, May 18, 2021 at 12:16 PM Wellington Chevreuil < > > [email protected]> wrote: > > > > > Greetings everyone, > > > > > > HBASE-24749 has been proposed almost a year ago, introducing a new > > > StoreFile tracker as a way to allow for any hbase hfile modifications > to > > be > > > safely completed without needing a file system rename. This seems > pretty > > > relevant for deployments over S3 file systems, where rename operations > > are > > > not atomic and can have a performance degradation when multiple > requests > > > get concurrently submitted to the same bucket. We had done superficial > > > tests and ycsb runs, where individual renames of files larger than 5GB > > can > > > take a few hundreds of seconds to complete. We also observed impacts in > > > write loads throughput, the bottleneck potentially being the renames. > > > > > > With S3 being an important piece of my employer cloud solution, we > would > > > like to help it move forward. We plan to contribute new patches per the > > > original design/Jira, but we’d also be happy to review changes from the > > > original authors, too. Please let us know if anyone has any concerns, > > > otherwise we’ll start to self-assign issues on HBASE-24749 > > > > > > Wellington > > > > > > > > > -- > > Best regards, > > Andrew > > > > Words like orphans lost among the crosstalk, meaning torn from truth's > > decrepit hands > > - A23, Crosstalk > > >
