Yeah, that's the struggle with the multiple branches -- we want to see our changes in a version of HBase we're using, but that may not be the right place to land the changes :)

Since this is an "opt-in" and you obviously have _something_ working (given the benchmarks), I'd suggest breaking down the work into some milestones you can track. Set "exit criteria" for each: what do you expect should work when that milestone is "done"? Bulk loads don't have to come right away, but should be there before you can call the feature "done".

Another benefit is that this will make it a bit more manageable for others to get involved and poke at it.

I can also see "fold new system table into hbase:meta" as a later milestone. I think if you can show this works with its own table, it should be much easier to just fold that into meta than building the initial feature :)

On 7/22/20 3:38 AM, Tak-Lon (Stephen) Wu wrote:
Thanks Josh, and yeah object store is a bit different lol.

the major reason we didn't try to fold that into meta table were that
we don't know how well meta table can be scale, e.g. as Stack
mentioned about a previous design in HBASE-14090, it matches our
initial estimate that these piece of new data could be vary from 100+
MB-level to ~5 GB-level. With the splitting meta table and meta table
could be handling more work, we'd definitely move that into meta.
(side note we started with branch-2.2 :p )

good call on bulk load, thanks. Also, we will try to support snapshot
related features well.

-Stephen



On Tue, Jul 21, 2020 at 4:54 PM Josh Elser <els...@apache.org> wrote:

Oh, and don't forget, you have to update bulk load to work with this
approach.

Never knew that we had a utility to pick up files that folks wrote
directly into the hbase.rootdir (RefreshHFilesClient). I am 110% behind
ripping that out. We have bulk loading as the supported path for a reason :)

On 7/21/20 1:45 PM, Tak-Lon (Stephen) Wu wrote:
Hi guys,

I'm sending this email to get more comments and thoughts from the dev@list
for an open discussion item on HBASE-24749
<https://issues.apache.org/jira/browse/HBASE-24749>.

mainly we're proposing a feature with a new store engine to skip the use of
.tmp directory in the HFile commit stage and write directly to data
directory.

The proposal doc
<https://issues.apache.org/jira/secure/attachment/13008049/Apache%20HBase%20-%20Direct%20insert%20HFiles%20and%20Persist%20in-memory%20HFile%20tracking.pdf>
is on the JIRA and we have provided initial results
<https://issues.apache.org/jira/secure/attachment/13008050/1B100m-25m25m-performance.pdf>
with YCSB 25m and 1B that shows it's positive with the changes.

Improvement Highlights
1. Lower write latency, especially the p99+
2. Higher write throughput on flush and compaction
3. Lower MTTR on region (re)open or assignment
4. Remove consistent check dependencies (e.g. DynamoDB) supported by file
system implementation

Again, any suggestions are welcomed.

Thanks,
Stephen

Reply via email to