[
https://issues.apache.org/jira/browse/HBASE-10829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13946911#comment-13946911
]
Enis Soztutar commented on HBASE-10829:
---------------------------------------
Here is a log of events in case you are interested. The region
(9514935e6a659bd90faa21bf458a842e) was happily hosted by some region server.
After the writes have settled down, the region had some un-flushed data. The
last flush happened, and after some time, the write tasks finished, so no more
data was coming in for some time:
{code}
2014-03-24 20:54:30,924 INFO [Thread-22] regionserver.HRegion: Finished
memstore flush of ~128.2 M/134443296, currentsize=12.7 M/13270608 for region
IntegrationTestBigLinkedList,\x07\xFE\xDA\x1Chv\xF9\x7F\x18s\xEE\x0C\x85X\xFCU,1395690539958.9514935e6a659bd90faa21bf458a842e.
in 7324ms, sequenceid=119978, compaction requested=true
{code}
After some more time, the region decided to do a compaction. At this point no
writes were coming.
{code}
compaction
2014-03-24 20:55:52,764 INFO
[regionserver60020-smallCompactions-1395694311085] regionserver.HStore:
Starting compaction of 5 file(s) in meta of
IntegrationTestBigLinkedList,\x07\xFE\xDA\x1Chv\xF9\x7F\x18s\xEE\x0C\x85X\xFCU,1395690539958.9514935e6a659bd90faa21bf458a842e.
into
tmpdir=hdfs://hor9n01.gq1.ygridcore.net:8020/apps/hbase/data/data/default/IntegrationTestBigLinkedList/9514935e6a659bd90faa21bf458a842e/.tmp,
totalSize=212.2 M
{code}
After this compaction, but before any more flush, the region server got killed
around:
{code}
2014-03-24 20:56:44,466 DEBUG [regionserver60020-EventThread]
regionserver.SplitLogWorker: tasks arrived or departed
{code}
Because of the region server got killed, the cluster performed a log split,
which completed without any issues (logs are not necessary). This resulted in 7
log files to be split, resulting in 7 files in recovered.edits under region
dir.
Then, some other region server opens the region and applies the recovered edits
in memory:
{code}
Open region:
2014-03-24 20:57:28,196 DEBUG [StoreOpener-9514935e6a659bd90faa21bf458a842e-1]
regionserver.HStore: loaded
hdfs://hor9n01.gq1.ygridcore.net:8020/apps/hbase/data/data/default/IntegrationTestBigLinkedList/9514935e6a659bd90faa21bf458a842e/meta/02f7152afee34b07b40fa31e0de5a3de,
isReference=false, isBulkLoadResult=false, seqid=119978, majorCompaction=false
2014-03-24 20:57:28,240 DEBUG [StoreOpener-9514935e6a659bd90faa21bf458a842e-1]
regionserver.HStore: loaded
hdfs://hor9n01.gq1.ygridcore.net:8020/apps/hbase/data/data/default/IntegrationTestBigLinkedList/9514935e6a659bd90faa21bf458a842e/meta/69260a6d4ffc45a1806dd501204b73ce,
isReference=false, isBulkLoadResult=false, seqid=88532, majorCompaction=true
2014-03-24 20:57:28,264 INFO [RS_OPEN_REGION-hor9n08:60020-2]
regionserver.HRegion: Replaying edits from
hdfs://hor9n01.gq1.ygridcore.net:8020/apps/hbase/data/data/default/IntegrationTestBigLinkedList/9514935e6a659bd90faa21bf458a842e/recovered.edits/0000000000000118699
2014-03-24 20:57:28,457 DEBUG [RS_OPEN_REGION-hor9n08:60020-2]
regionserver.HRegion: Applied 0, skipped 187830, firstSequenceidInLog=118084,
maxSequenceidInLog=-1,
path=hdfs://hor9n01.gq1.ygridcore.net:8020/apps/hbase/data/data/default/IntegrationTestBigLinkedList/9514935e6a659bd90faa21bf458a842e/recovered.edits/0000000000000118699
2014-03-24 20:57:28,460 INFO [RS_OPEN_REGION-hor9n08:60020-2]
regionserver.HRegion: Replaying edits from
hdfs://hor9n01.gq1.ygridcore.net:8020/apps/hbase/data/data/default/IntegrationTestBigLinkedList/9514935e6a659bd90faa21bf458a842e/recovered.edits/0000000000000119351
2014-03-24 20:57:28,630 DEBUG [RS_OPEN_REGION-hor9n08:60020-2]
regionserver.HRegion: Applied 0, skipped 199401, firstSequenceidInLog=118700,
maxSequenceidInLog=-1,
path=hdfs://hor9n01.gq1.ygridcore.net:8020/apps/hbase/data/data/default/IntegrationTestBigLinkedList/9514935e6a659bd90faa21bf458a842e/recovered.edits/0000000000000119351
2014-03-24 20:57:28,632 INFO [RS_OPEN_REGION-hor9n08:60020-2]
regionserver.HRegion: Replaying edits from
hdfs://hor9n01.gq1.ygridcore.net:8020/apps/hbase/data/data/default/IntegrationTestBigLinkedList/9514935e6a659bd90faa21bf458a842e/recovered.edits/0000000000000120086
2014-03-24 20:57:28,873 DEBUG [RS_OPEN_REGION-hor9n08:60020-2]
regionserver.HRegion: Applied 37938, skipped 148185,
firstSequenceidInLog=119352, maxSequenceidInLog=120086,
path=hdfs://hor9n01.gq1.ygridcore.net:8020/apps/hbase/data/data/default/IntegrationTestBigLinkedList/9514935e6a659bd90faa21bf458a842e/recovered.edits/0000000000000120086
2014-03-24 20:57:28,876 INFO [RS_OPEN_REGION-hor9n08:60020-2]
regionserver.HRegion: Replaying edits from
hdfs://hor9n01.gq1.ygridcore.net:8020/apps/hbase/data/data/default/IntegrationTestBigLinkedList/9514935e6a659bd90faa21bf458a842e/recovered.edits/0000000000000120806
2014-03-24 20:57:30,130 DEBUG [RS_OPEN_REGION-hor9n08:60020-2]
regionserver.HRegion: Applied 189648, skipped 0, firstSequenceidInLog=120087,
maxSequenceidInLog=120806,
path=hdfs://hor9n01.gq1.ygridcore.net:8020/apps/hbase/data/data/default/IntegrationTestBigLinkedList/9514935e6a659bd90faa21bf458a842e/recovered.edits/0000000000000120806
2014-03-24 20:57:30,133 INFO [RS_OPEN_REGION-hor9n08:60020-2]
regionserver.HRegion: Replaying edits from
hdfs://hor9n01.gq1.ygridcore.net:8020/apps/hbase/data/data/default/IntegrationTestBigLinkedList/9514935e6a659bd90faa21bf458a842e/recovered.edits/0000000000000121436
2014-03-24 20:57:31,622 DEBUG [RS_OPEN_REGION-hor9n08:60020-2]
regionserver.HRegion: Applied 187988, skipped 0, firstSequenceidInLog=120807,
maxSequenceidInLog=121436,
path=hdfs://hor9n01.gq1.ygridcore.net:8020/apps/hbase/data/data/default/IntegrationTestBigLinkedList/9514935e6a659bd90faa21bf458a842e/recovered.edits/0000000000000121436
2014-03-24 20:57:31,625 INFO [RS_OPEN_REGION-hor9n08:60020-2]
regionserver.HRegion: Replaying edits from
hdfs://hor9n01.gq1.ygridcore.net:8020/apps/hbase/data/data/default/IntegrationTestBigLinkedList/9514935e6a659bd90faa21bf458a842e/recovered.edits/0000000000000122408
2014-03-24 20:57:33,850 DEBUG [RS_OPEN_REGION-hor9n08:60020-2]
regionserver.HRegion: Applied 206643, skipped 0, firstSequenceidInLog=121437,
maxSequenceidInLog=122408,
path=hdfs://hor9n01.gq1.ygridcore.net:8020/apps/hbase/data/data/default/IntegrationTestBigLinkedList/9514935e6a659bd90faa21bf458a842e/recovered.edits/0000000000000122408
2014-03-24 20:57:33,860 INFO [RS_OPEN_REGION-hor9n08:60020-2]
regionserver.HRegion: Replaying edits from
hdfs://hor9n01.gq1.ygridcore.net:8020/apps/hbase/data/data/default/IntegrationTestBigLinkedList/9514935e6a659bd90faa21bf458a842e/recovered.edits/0000000000000122409
2014-03-24 20:57:33,864 DEBUG [RS_OPEN_REGION-hor9n08:60020-2]
regionserver.HStore: Completing compaction from the WAL marker ====> we replay
the compaction marker
2014-03-24 20:57:33,866 DEBUG [RS_OPEN_REGION-hor9n08:60020-2]
regionserver.HRegion: Applied 0, skipped 1, firstSequenceidInLog=122409,
maxSequenceidInLog=-1, ====> contains only compaction record
path=hdfs://hor9n01.gq1.ygridcore.net:8020/apps/hbase/data/data/default/IntegrationTestBigLinkedList/9514935e6a659bd90faa21bf458a842e/recovered.edits/0000000000000122409
{code}
As per above, some of the edits are skipped, and the rest is applied to
memstore. The latest flush file had seqId=119978, and we were able to replay up
to seqId=122409. However notice that the final log file contains ONLY a
compaction record! This record has the seqId, but we still return -1 to the
caller. This results in the region skipping the mandatory memstore flush BEFORE
opening the region for writes and deleting the log files:
{code}
try {
seqid = replayRecoveredEdits(edits, maxSeqIdInStores, reporter);
} catch (IOException e) {
}
if (seqid > minSeqIdForTheRegion) {
// Then we added some edits to memory. Flush and cleanup split edit files.
internalFlushcache(null, seqid, status);
}
{code}
Now, the region is opened without the flush, and we also delete the
recovered.edit logs thinking that we made the edits durable:
{code}
2014-03-24 20:57:33,893 DEBUG [RS_OPEN_REGION-hor9n08:60020-2]
regionserver.HRegion: Deleted recovered.edits
file=hdfs://hor9n01.gq1.ygridcore.net:8020/apps/hbase/data/data/default/IntegrationTestBigLinkedList/9514935e6a659bd90faa21bf458a842e/recovered.edits/0000000000000118699
2014-03-24 20:57:33,983 DEBUG [RS_OPEN_REGION-hor9n08:60020-2]
regionserver.HRegion: Deleted recovered.edits
file=hdfs://hor9n01.gq1.ygridcore.net:8020/apps/hbase/data/data/default/IntegrationTestBigLinkedList/9514935e6a659bd90faa21bf458a842e/recovered.edits/0000000000000119351
2014-03-24 20:57:34,110 INFO [RS_OPEN_REGION-hor9n08:60020-2]
regionserver.HRegion: Onlined 9514935e6a659bd90faa21bf458a842e; next
sequenceid=119979 ====> NO FLUSH
{code}
Notice the lack of flush above. Since the CM is still running, now this region
server is killed before a flush happens in this region, resulting in data loss
on the in memory data. Since recovered.edits files are also deleted, there is
no recovery back.
Thanks to [~devaraj] for pair-debugging this issue.
> Flush is skipped after log replay if the last recovered edits file is skipped
> -----------------------------------------------------------------------------
>
> Key: HBASE-10829
> URL: https://issues.apache.org/jira/browse/HBASE-10829
> Project: HBase
> Issue Type: Bug
> Reporter: Enis Soztutar
> Assignee: Enis Soztutar
> Priority: Critical
> Fix For: 0.99.0, 0.98.2, 0.96.3
>
>
> We caught this in an extended test run where IntegrationTestBigLinkedList
> failed with some missing keys.
> The problem is that HRegion.replayRecoveredEdits() would return -1 if all the
> edits in the log file is skipped, which is true for example if the log file
> only contains a single compaction record (HBASE-2231) or somehow the edits
> cannot be applied (column family deleted, etc).
> The callee, HRegion.replayRecoveredEditsIfAny() only looks for the last
> returned seqId to decide whether a flush is necessary or not before opening
> the region, and discarding replayed recovered edits files.
> Therefore, if the last recovered edits file is skipped but some edits from
> earlier recovered edits files are applied, the mandatory flush before opening
> the region is skipped. If the region server dies after this point before a
> flush, the edits are lost.
> This is important to fix, though the sequence of events are super rare for a
> production cluster.
--
This message was sent by Atlassian JIRA
(v6.2#6252)