[ 
https://issues.apache.org/jira/browse/HADOOP-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12543621
 ] 

Jim Kellerman commented on HADOOP-2139:
---------------------------------------

Once this patch passes muster,  there are more ways in which we could reduce 
wasted activity in the region server:

- each region knows when it needs flushing. If it had a mechanism to notify the 
region server, the flusher thread could wait for a "flush me" request to 
arrive, and could wake up periodically (like 1/10 of what it does now)  to 
flush regions that have not been flushed recently. Currently it has a counter, 
and if the counter exceeds that threshold, it flushes the region anyway (if it 
has any unflushed updates). Using this mechanism, the cache flusher thread 
would run far less frequently, but would run when required for regions 
receiving a lot of updates.

- there is no need to check to see if a region needs compacting unless it has 
been flushed. The flusher thread could notify the compact/split thread after it 
does a flush.

- we currently do not even try to do a region split unless it has been 
compacted. The region server could keep track of the amount of data that has 
been flushed to a region since the last split. when the split threshold is 
crossed, then it should do a split.

These changes would eliminate a lot of  "busy work" done by the region server.


> [hbase] Increase parallelism in region servers
> ----------------------------------------------
>
>                 Key: HADOOP-2139
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2139
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>            Reporter: Jim Kellerman
>            Assignee: Jim Kellerman
>             Fix For: 0.16.0
>
>         Attachments: locking.xls, operation-compatibility.jpg, patch.txt
>
>
> There are a number of paths in the region server which block against one 
> another including:
> - log rolling
> - cache flushes
> - region splitting
> - updates
> - scanners
> Investigate which can proceed in parallel and mechanisms for making some 
> operations that currently do not run in parallel.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to