[ 
https://issues.apache.org/jira/browse/HBASE-7006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13576225#comment-13576225
 ] 

Enis Soztutar commented on HBASE-7006:
--------------------------------------

These are excellent results, especially with large # of regions. Also we will 
benefit from other improvements on connection management, region discovery, 
etc, which means that those numbers can go even lower. Let's try to get this in 
with the current set of changes, then as we debug more and learn more, we can 
do follow ups. 

One thing we did not test is to not write a file per region per WAL file, but 
do the bigtable approach. Namely, for each WAL file, read up until DFS block 
size (128MB), sort the edits per region in memory, and write a file per block. 
The files have a simple index per region. Not sure how we can test that easily 
though. 
                
> [MTTR] Study distributed log splitting to see how we can make it faster
> -----------------------------------------------------------------------
>
>                 Key: HBASE-7006
>                 URL: https://issues.apache.org/jira/browse/HBASE-7006
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: Jeffrey Zhong
>             Fix For: 0.96.0
>
>         Attachments: LogSplitting Comparison.pdf, 
> ProposaltoimprovelogsplittingprocessregardingtoHBASE-7006.pdf
>
>
> Just saw interesting issue where a cluster went down  hard and 30 nodes had 
> 1700 WALs to replay.  Replay took almost an hour.  It looks like it could run 
> faster that much of the time is spent zk'ing and nn'ing.
> Putting in 0.96 so it gets a look at least.  Can always punt.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to