[
https://issues.apache.org/jira/browse/HBASE-13260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14519954#comment-14519954
]
Enis Soztutar commented on HBASE-13260:
---------------------------------------
This is useful exercise (although, Nick sorry to delay the RC). I did some more
experiments yesterday. A pure FSHLog based proc store can do 1M procedures in
~20 seconds which is more or less on par with the current WAL based one. For
figuring out where the bottleneck is, I've changed the region based proc store
to instead use {{numShards}} tables and do simple sharding.
With 8+ shards, the write throughput is 25-30 secs (compared to ~20secs).
{code}
4 shards:
java.lang.AssertionError: Wrote 1000000 procedures with 50 threads with
useProcV2Wal=false hsync=false in 2mins, 7.265sec (127.265sec)
8 shards:
java.lang.AssertionError: Wrote 1000000 procedures with 50 threads with
useProcV2Wal=false hsync=false in 31.0280sec (31.028sec)
16 shards:
java.lang.AssertionError: Wrote 1000000 procedures with 50 threads with
useProcV2Wal=false hsync=false in 25.4330sec (25.433sec)
32 shards:
java.lang.AssertionError: Wrote 1000000 procedures with 50 threads with
useProcV2Wal=false hsync=false in 25.9470sec (25.947sec)
{code}
So it seems that the bottleneck is not on the CPU, but the HRegion's
concurrency. I've also done some basic testing with ASYNC_WAL and SKIP_WAL
which surprisingly was slower than SYNC_WAL. I think it is an area worth
digging into later. Maybe I am still missing something (code is at
https://github.com/enis/hbase/tree/hbase-13260-review)
I am not suggesting that we do sharding for the store only to get around the
region concurrency problem. Any improvement in this is definitely a big win for
both regular data and proc metadata, but I am not sure whether we can get there
soon enough.
I like the idea of different stores for different kinds of procedures. We
already keep (some) assignment state in meta (in zk-less AM) which is kind of
like custom proc on a meta proc store. In an alternative world, we could have
used the meta table for everything (table descriptors, table state,
assignments, list of region files) and be done with it.
For the less amount of work though, I think we should chose one and stick with
it. Otherwise, we have to support two alternative code paths, migration code,
upgrading etc. It is just wasted effort I think. Whether to go with the wal
based one or region based one is a question of the design of proc-based
assignment since for DDL ops it does not matter. Unfortunately it is not
formalized yet. If we end up splitting meta, we can even do proc store on meta.
If we think that assignment using procs will use the local proc store in
master, we should go with the WAL based one since I don't think doing sharding
for the region based one is right. Otherwise, we should go with the region
based one. Sorry this is vague, but since we have yet to figure out the
specifics of the new AM, it is hard to decide.
> Bootstrap Tables for fun and profit
> ------------------------------------
>
> Key: HBASE-13260
> URL: https://issues.apache.org/jira/browse/HBASE-13260
> Project: HBase
> Issue Type: Bug
> Reporter: Enis Soztutar
> Assignee: Enis Soztutar
> Fix For: 2.0.0, 1.1.0
>
> Attachments: hbase-13260_bench.patch, hbase-13260_prototype.patch
>
>
> Over at the ProcV2 discussions(HBASE-12439) and elsewhere I was mentioning an
> idea where we may want to use regular old regions to store/persist some data
> needed for HBase master to operate.
> We regularly use system tables for storing system data. acl, meta, namespace,
> quota are some examples. We also store the table state in meta now. Some data
> is persisted in zk only (replication peers and replication state, etc). We
> are moving away from zk as a permanent storage. As any self-respecting
> database does, we should store almost all of our data in HBase itself.
> However, we have an "availability" dependency between different kinds of
> data. For example all system tables need meta to be assigned first. All
> master operations need ns table to be assigned, etc.
> For at least two types of data, (1) procedure v2 states, (2) RS groups in
> HBASE-6721 we cannot depend on meta being assigned since "assignment" itself
> will depend on accessing this data. The solution in (1) is to implement a
> custom WAL format, and custom recover lease and WAL recovery. The solution in
> (2) is to have the table to store this data, but also cache it in zk for
> bootrapping initial assignments.
> For solving both of the above (and possible future use cases if any), I
> propose we add a "boostrap table" concept, which is:
> - A set of predefined tables hosted in a separate dir in HDFS.
> - A table is only 1 region, not splittable
> - Not assigned through regular assignment
> - Hosted only on 1 server (typically master)
> - Has a dedicated WAL.
> - A service does WAL recovery + fencing for these tables.
> This has the benefit of using a region to keep the data, but frees us to
> re-implement caching and we can use the same WAL / Memstore / Recovery
> mechanisms that are battle-tested.
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)