[
https://issues.apache.org/jira/browse/HBASE-6335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409126#comment-13409126
]
chunhui shen commented on HBASE-6335:
-------------------------------------
Distributed log splitting will be very slow in some case mentioned on
HBASE-6309,
It happened in our environment that distributed-log-splitting took 3+ hours,but
local-master-log-splitting only took 22+ mins;(9 regionserver, 2500 regions per
hlog file, total 280 hlog files, 16 ms per rename operation(2 ms per rename
operation in common, but that time it took much more time, maybe because of
high NN load))
If you kill master in the progress of splitting log because too slow
distributed-log-splitting, and switch to local-master-log-splitting, you will
find data loss.
> Switching log-splitting policy after last failure master start may cause data
> loss
> ----------------------------------------------------------------------------------
>
> Key: HBASE-6335
> URL: https://issues.apache.org/jira/browse/HBASE-6335
> Project: HBase
> Issue Type: Bug
> Components: master
> Affects Versions: 0.92.1, 0.94.0
> Reporter: chunhui shen
> Assignee: chunhui shen
>
> How happen?
> If server A is down, and it has three log files, all the data is from one
> region.
> File 1: kv01 kv02 kv03
> File 2: kv04 kv05 kv06
> File 3: kv07 kv08 kv09
> Here,kv01 means, its log seqID is 01
> Case:Switch to maste-local-log-splitting from distributed-log-splitting
> 1.Master find serverA is down, and start to split its log files using
> split-log-splitting.
> 2.Successfully split log file2, and move it to oldLogs, and generate one edit
> file named 06 in region recover.edits dir.
> 3.Master restart, and change the log-splitting policy to
> maste-local-log-splitting , and start to split file 1, file 3
> 4.Successfully split log file1 and file3, and generate one edit file named 09
> in region recover.edits dir.
> 5.Region replay edits from edit file 06 and 09, Region's seqID is 06 after it
> replay edits from 06, and when replaying edit from 09, it will skip
> kv01,kv02,kv03, So these data loss.
> As the above case, if we switch to distributed-log-splitting from
> maste-local-log-splitting, it could also cause data loss
> Should we fix this bug or avoid the case? I'm not sure...
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira