[
https://issues.apache.org/jira/browse/HBASE-7006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13647328#comment-13647328
]
stack commented on HBASE-7006:
------------------------------
Thinking on it, flushing after all logs recovered is a bad idea because it a
special case. Replay mutations, as is, are treated like any other inbound
edit. I think this good.
Turning off WALs and flushing on the end and trying to figure what we failed to
write or writing hfiles directly -- if you could, and I don't think you can
since edits need to be sorted in an hfile -- and by-passing memstore and then
telling the Region to pick up the new hfile when done all introduce new states
that we will have to manage complicating critical recovery.
> [MTTR] Study distributed log splitting to see how we can make it faster
> -----------------------------------------------------------------------
>
> Key: HBASE-7006
> URL: https://issues.apache.org/jira/browse/HBASE-7006
> Project: HBase
> Issue Type: Bug
> Components: MTTR
> Reporter: stack
> Assignee: Jeffrey Zhong
> Priority: Critical
> Fix For: 0.95.1
>
> Attachments: hbase-7006-combined.patch, hbase-7006-combined-v1.patch,
> hbase-7006-combined-v3.patch, hbase-7006-combined-v4.patch, LogSplitting
> Comparison.pdf,
> ProposaltoimprovelogsplittingprocessregardingtoHBASE-7006-v2.pdf
>
>
> Just saw interesting issue where a cluster went down hard and 30 nodes had
> 1700 WALs to replay. Replay took almost an hour. It looks like it could run
> faster that much of the time is spent zk'ing and nn'ing.
> Putting in 0.96 so it gets a look at least. Can always punt.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira