[ https://issues.apache.org/jira/browse/HADOOP-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12543621 ]
Jim Kellerman commented on HADOOP-2139: --------------------------------------- Once this patch passes muster, there are more ways in which we could reduce wasted activity in the region server: - each region knows when it needs flushing. If it had a mechanism to notify the region server, the flusher thread could wait for a "flush me" request to arrive, and could wake up periodically (like 1/10 of what it does now) to flush regions that have not been flushed recently. Currently it has a counter, and if the counter exceeds that threshold, it flushes the region anyway (if it has any unflushed updates). Using this mechanism, the cache flusher thread would run far less frequently, but would run when required for regions receiving a lot of updates. - there is no need to check to see if a region needs compacting unless it has been flushed. The flusher thread could notify the compact/split thread after it does a flush. - we currently do not even try to do a region split unless it has been compacted. The region server could keep track of the amount of data that has been flushed to a region since the last split. when the split threshold is crossed, then it should do a split. These changes would eliminate a lot of "busy work" done by the region server. > [hbase] Increase parallelism in region servers > ---------------------------------------------- > > Key: HADOOP-2139 > URL: https://issues.apache.org/jira/browse/HADOOP-2139 > Project: Hadoop > Issue Type: Improvement > Components: contrib/hbase > Affects Versions: 0.16.0 > Reporter: Jim Kellerman > Assignee: Jim Kellerman > Fix For: 0.16.0 > > Attachments: locking.xls, operation-compatibility.jpg, patch.txt > > > There are a number of paths in the region server which block against one > another including: > - log rolling > - cache flushes > - region splitting > - updates > - scanners > Investigate which can proceed in parallel and mechanisms for making some > operations that currently do not run in parallel. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.