HBASE-24749 design and implementation had acknowledged compromises on review: e.g. adding a new 'system table' to hold store files. I'd suggest the design and implementation need a revisit before we go forward; for instance, factoring for systems other than s3 as suggested above (I like the Duo list).
S On Wed, May 19, 2021 at 8:19 AM 张铎(Duo Zhang) <[email protected]> wrote: > What about just storing the hfile list in a file? Since now S3 has strong > consistency, we could safely overwrite a file then I think? > > And since the hfile list file will be very small, renaming will not be a > big problem. > > We could write the hfile list to a file called 'hfile.list.tmp', and then > rename it to 'hfile.list'. > > This is safe for HDFS, and for S3, since it is not atomic, maybe we could > face that, the 'hfile.list' file is not there, but there is a > 'hfile.list.tmp'. > > So when opening a HStore, we first check if 'hfile.list' is there, if not, > try 'hfile.list.tmp', rename it and load it. For safety, we could write an > initial hfile list file with no hfiles. So if we can not load either > 'hfile.list' or 'hfile.list.tmp', then we know something is wrong so users > should try to fix it with HBCK. > And in HBCK, we will do a listing and generate the 'hfile.list' file. > > WDYT? > > Thanks. > > Wellington Chevreuil <[email protected]> 于2021年5月19日周三 > 下午10:43写道: > > > Thank you, Andrew and Duo, > > > > Talking internally with Josh Elser, initial idea was to rebase the > feature > > branch with master (in order to catch with latest commits), then focus on > > work to have a minimal functioning hbase, in other words, together with > the > > already committed work from HBASE-25391, make sure flush, compactions, > > splits and merges all can take advantage of the persistent store file > > manager and complete with no need to rely on renames. These all map to > the > > substasks HBASE-25391, HBASE-25392 and HBASE-25393. Once we could test > and > > validate this works well for our goals, we can then focus on snapshots, > > bulkloading and tooling. > > > > S3 now supports strong consistency, and I heard that they are also > > > implementing atomic renaming currently, so maybe that's one of the > > reasons > > > why the development is silent now.. > > > > > Interesting, I had no idea this was being implemented. I know, however, a > > version of this feature is already available on latest EMR releases (at > > least from 6.2.0), and AWS team has published their own blog post with > > their results: > > > > > https://aws.amazon.com/blogs/big-data/amazon-emr-6-2-0-adds-persistent-hfile-tracking-to-improve-performance-with-hbase-on-amazon-s3/ > > > > But I do not think store hfile list in meta is the only solution. It will > > > cause cyclic dependencies for hbase:meta, and then force us a have a > > > fallback solution which makes the code a bit ugly. We should try to see > > if > > > this could be done with only the FileSystem. > > > > > This is indeed a relevant concern. One idea I had mentioned in the > original > > design doc was to track committed/non-committed files through xattr (or > > tags), which may have its own performance issues as explained by Stephen > > Wu, but is something that could be attempted. > > > > Em qua., 19 de mai. de 2021 às 04:56, 张铎(Duo Zhang) < > [email protected] > > > > > escreveu: > > > > > S3 now supports strong consistency, and I heard that they are also > > > implementing atomic renaming currently, so maybe that's one of the > > reasons > > > why the development is silent now... > > > > > > For me, I also think deploying hbase on cloud storage is the future, > so I > > > would also like to participate here. > > > > > > But I do not think store hfile list in meta is the only solution. It > will > > > cause cyclic dependencies for hbase:meta, and then force us a have a > > > fallback solution which makes the code a bit ugly. We should try to see > > if > > > this could be done with only the FileSystem. > > > > > > Thanks. > > > > > > Andrew Purtell <[email protected]> 于2021年5月19日周三 上午8:04写道: > > > > > > > Wellington (and et. al), > > > > > > > > S3 is also an important piece of our future production plans. > > > > Unfortunately, we were unable to assist much with last year's work, > on > > > > account of being sidetracked by more immediate concerns. Fortunately, > > > this > > > > renewed interest is timely in that we have an HBase 2 project where, > if > > > > this can land in a 2.5 or a 2.6, it could be an important cost to > serve > > > > optimization, and one we could and would make use of. Therefore I > would > > > > like to restate my employer's interest in this work too. It may just > be > > > > Viraj and myself in the early days. > > > > > > > > I'm not sure how best to collaborate. We could review changes from > the > > > > original authors, new changes, and/or divide up the development > tasks. > > We > > > > can certainly offer our time for testing, and can afford the costs of > > > > testing against the S3 service. > > > > > > > > > > > > On Tue, May 18, 2021 at 12:16 PM Wellington Chevreuil < > > > > [email protected]> wrote: > > > > > > > > > Greetings everyone, > > > > > > > > > > HBASE-24749 has been proposed almost a year ago, introducing a new > > > > > StoreFile tracker as a way to allow for any hbase hfile > modifications > > > to > > > > be > > > > > safely completed without needing a file system rename. This seems > > > pretty > > > > > relevant for deployments over S3 file systems, where rename > > operations > > > > are > > > > > not atomic and can have a performance degradation when multiple > > > requests > > > > > get concurrently submitted to the same bucket. We had done > > superficial > > > > > tests and ycsb runs, where individual renames of files larger than > > 5GB > > > > can > > > > > take a few hundreds of seconds to complete. We also observed > impacts > > in > > > > > write loads throughput, the bottleneck potentially being the > renames. > > > > > > > > > > With S3 being an important piece of my employer cloud solution, we > > > would > > > > > like to help it move forward. We plan to contribute new patches per > > the > > > > > original design/Jira, but we’d also be happy to review changes from > > the > > > > > original authors, too. Please let us know if anyone has any > concerns, > > > > > otherwise we’ll start to self-assign issues on HBASE-24749 > > > > > > > > > > Wellington > > > > > > > > > > > > > > > > > -- > > > > Best regards, > > > > Andrew > > > > > > > > Words like orphans lost among the crosstalk, meaning torn from > truth's > > > > decrepit hands > > > > - A23, Crosstalk > > > > > > > > > >
