[ 
https://issues.apache.org/jira/browse/HBASE-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15291174#comment-15291174
 ] 

Yu Li commented on HBASE-14623:
-------------------------------

Recently we encountered some issue due to namespace table recovery blocked by 
wal split of pre-holding RS, the sequence is like:
1. Many RS, rather than simply the single RS holding namespace, crashed due to 
temporary network problem (causing all datanodes of pipeline bad), during a 
*rolling upgrade*
2. Master restarted before DLS of RS previously holding region of namespace 
table finished, stuck and finally aborted due to namespace region online 
timeout ({{hbase.master.namespace.init.timeout}} default to 5min), see 
{{TableNamespaceManager#start}}

I guess if we could add a similar mechanism to split and recover namespace 
table earlier like meta table, we could avoid such problem:
{code:title=SplitLogWorker#taskLoop|borderStyle=solid}
      // pick meta wal firstly
      int offset = (int) (Math.random() * paths.size());
      for (int i = 0; i < paths.size(); i++) {
        if (DefaultWALProvider.isMetaFile(paths.get(i))) {
          offset = i;
          break;
        }
      }
{code}

So maybe this is a good reason for this JIRA to go in? Thanks.

> Implement dedicated WAL for system tables
> -----------------------------------------
>
>                 Key: HBASE-14623
>                 URL: https://issues.apache.org/jira/browse/HBASE-14623
>             Project: HBase
>          Issue Type: Sub-task
>          Components: wal
>            Reporter: Ted Yu
>            Assignee: Ted Yu
>              Labels: wal
>             Fix For: 2.0.0
>
>         Attachments: 14623-v1.txt, 14623-v2.txt, 14623-v2.txt, 14623-v2.txt, 
> 14623-v2.txt, 14623-v3.txt, 14623-v4.txt
>
>
> As Stephen suggested in parent JIRA, dedicating separate WAL for system 
> tables (other than hbase:meta) should be done in new JIRA.
> This task is to fulfill the system WAL separation.
> Below is summary of discussion:
> For system table to have its own WAL, we would recover system table faster 
> (fast log split, fast log replay). It would probably benefit 
> AssignmentManager on system table region assignment. At this time, the new 
> AssignmentManager is not planned to change WAL. So the existence of this JIRA 
> is good for overall system, not specific to AssignmentManager.
> There are 3 strategies for implementing system table WAL:
> 1. one WAL for all non-meta system tables
> 2. one WAL for each non-meta system table
> 3. one WAL for each region of non-meta system table
> Currently most system tables are one region table (only ACL table may become 
> big). Choices 2 and 3 basically are the same.
> From implementation point of view, choices 2 and 3 are cleaner than choice 1 
> (as we have already had 1 WAL for META table and we can reuse the logic). 
> With choice 2 or 3, assignment manager performance should not be impacted and 
> it would be easier for assignment manager to assign system table region (eg. 
> without waiting for user table log split to complete for assigning system 
> table region).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to