[jira] [Commented] (HBASE-13260) Bootstrap Tables for fun and profit

Matteo Bertozzi (JIRA) Tue, 21 Apr 2015 16:47:44 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-13260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14506034#comment-14506034
 ]


Matteo Bertozzi commented on HBASE-13260:
-----------------------------------------

that number looks pretty good to me, 
because you are basically executing only running availableProcessors() at the 
time.

try to pass 50 as the WAL threads number (there is just 1 thread in the wal). 
the wal can be tuned based on how many threads will be pushing stuff into the 
system. the default is availableProcessors(), so you are doing a sync every 4 
or 8 operation. if you bump that to the number of threads that are pushing 
stuff you'll be probably going way faster.

{noformat}
-    ProcedureExecutor executor = 
util.getMiniHBaseCluster().getMaster().getMasterProcedureExecutor();
-    store = executor.getStore();
+    
//util.getConfiguration().setLong(MasterProcedureConstants.MASTER_PROCEDURE_THREADS,
 50);
+    //ProcedureExecutor executor = 
util.getMiniHBaseCluster().getMaster().getMasterProcedureExecutor();
+    //store = executor.getStore();
+
+    FileSystem fs = 
util.getMiniHBaseCluster().getMaster().getMasterFileSystem().getFileSystem();
+    final Path logDir = new Path(
+        
util.getMiniHBaseCluster().getMaster().getMasterFileSystem().getRootDir(),
+        "testLogs");
+    store = new WALProcedureStore(util.getConfiguration(), fs, logDir, new 
WALProcedureStore.LeaseRecovery() {
+      @Override
+      public void recoverFileLease(FileSystem fs, Path path) throws 
IOException {
+        // no-op
+      }
+    });
+
+    // YOU MUST SPECIFY THE NUMBER OF THREADS THAT ARE PUSHING STUFF TO MAKE 
THE WAL FAST
+    store.start(50);
{noformat}

but still, I don't think the focus should be comparison on write perf (also the 
region wal has stuff like compression/encryption and more that it will be cool 
to have). so to me it is just matter of how can we get the same wal shortcuts 
(on replay) that we can get with a simple wal using the current region code. 
can we throw away the logs when we remove everything from the memstore? can we 
avoid the delete markers and similar.

> Bootstrap Tables for fun and profit 
> ------------------------------------
>
>                 Key: HBASE-13260
>                 URL: https://issues.apache.org/jira/browse/HBASE-13260
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Enis Soztutar
>            Assignee: Enis Soztutar
>             Fix For: 2.0.0, 1.1.0
>
>         Attachments: hbase-13260_bench.patch, hbase-13260_prototype.patch
>
>
> Over at the ProcV2 discussions(HBASE-12439) and elsewhere I was mentioning an 
> idea where we may want to use regular old regions to store/persist some data 
> needed for HBase master to operate. 
> We regularly use system tables for storing system data. acl, meta, namespace, 
> quota are some examples. We also store the table state in meta now. Some data 
> is persisted in zk only (replication peers and replication state, etc). We 
> are moving away from zk as a permanent storage. As any self-respecting 
> database does, we should store almost all of our data in HBase itself. 
> However, we have an "availability" dependency between different kinds of 
> data. For example all system tables need meta to be assigned first. All 
> master operations need ns table to be assigned, etc. 
> For at least two types of data, (1) procedure v2 states, (2) RS groups in 
> HBASE-6721 we cannot depend on meta being assigned since "assignment" itself 
> will depend on accessing this data. The solution in (1) is to implement a 
> custom WAL format, and custom recover lease and WAL recovery. The solution in 
> (2) is to have the table to store this data, but also cache it in zk for 
> bootrapping initial assignments. 
> For solving both of the above (and possible future use cases if any), I 
> propose we add a "boostrap table" concept, which is: 
>  - A set of predefined tables hosted in a separate dir in HDFS. 
>  - A table is only 1 region, not splittable 
>  - Not assigned through regular assignment 
>  - Hosted only on 1 server (typically master)
>  - Has a dedicated WAL. 
>  - A service does WAL recovery + fencing for these tables. 
> This has the benefit of using a region to keep the data, but frees us to 
> re-implement caching and we can use the same WAL / Memstore / Recovery 
> mechanisms that are battle-tested. 
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13260) Bootstrap Tables for fun and profit

Reply via email to