shihuafeng created HBASE-26209:
----------------------------------
Summary: edit file loss result in data loss when restart hbase
cluster
Key: HBASE-26209
URL: https://issues.apache.org/jira/browse/HBASE-26209
Project: HBase
Issue Type: Bug
Components: regionserver
Environment: Linux version 3.10.0-693.el7.x86_64
([email protected]) (gcc version 4.8.5 20150623 (Red
Hat 4.8.5-16) (GCC) ) #1 SMP Thu Jul 6 19:56:57 EDT 2017
Reporter: shihuafeng
Attachments: Repaly_edit.log, rename_edit.log
h1. when i restart hbase cluster,i find edit file loss when wal
repaly .
the number of edit files (00000000seqid.tmp to 00000000seqid ) is
31 when split wal to edit. But when i read edits to repaly ,i foud the sum
is 30.
i see rename file is sucessful,but i can not find .
i find system error
ACPI Error: SMBus/IPMI/GenericSerialBus write requires Buffer of
length 66, found length 32 (20130517/exfield-389)
Aug 16 03:30:41 esgsh6 kernel: ACPI Error: Method parse/execution failed
[\_SB_.PMI0._PMM] (Node ffff8810e9eab258), AE_AML_BUFFER_LIMIT
(20130517/psparse-536)
h1. {color:#3366ff} see the attachment for more details :{color}
# *rename (00000000seqid.tmp to 00000000seqid is 30*
i can find the follwing file , i confirm the edit file
(0*000000000001825010*)is not empty.
{panel:title=log}
hbase-cmf-hbase-REGIONSERVER-gy11.esgync.local.log.out:2021-08-16 17:56:28,956
INFO org.apache.hadoop.hbase.wal.WALSplitter: Rename
hdfs://nameservice1/hbase/data/default/TRAFODION.JAVABENCH2.OE_STOCK_INDEX_300/8a42de414d97b457da88bc4682dd7c52/recovered.edits/0000000000001810650.temp
to
hdfs://nameservice1/hbase/data/default/TRAFODION.JAVABENCH2.OE_STOCK_INDEX_300/8a42de414d97b457da88bc4682dd7c52/recovered.edits/0*000000000001825010*
{panel}
**
you can see attachment{color:#999999} {color}
*{color:#3366ff}r{color}*{color:#3366ff}*ename_edit.log*{color}
*2. at replay phase , Reading the edits is 30*
{panel:title=log}
hbase-cmf-hbase-REGIONSERVER-gy11.esgync.local.log.out:2021-08-16
17:56:14,938 INFO org.apache.hadoop.hbase.regionserver.HRegion: after
replayRecoveredEdits Maximum sequenceid 1914955 and minimum sequenceid for the
region is 1916711, replay the file,
path=hdfs://nameservice1/hbase/data/default/TRAFODION.JAVABENCH2.OE_STOCK_INDEX_300/8a42de414d97b457da88bc4682dd7c52/recovered.edits/0000000000001914955,seqid=1916711,*size=30*
{panel}
**
{code:java}
org.apache.hadoop.hbase.regionserver.HRegion NavigableSet<Path> files =
WALSplitter.getSplitEditFilesSorted(fs, regiondir);
if (LOG.isDebugEnabled()) {
LOG.debug("Found " + (files == null ? 0 : files.size())
+ " recovered edits file(s) under " + regiondir);
}
if (files == null || files.isEmpty()) return seqid;
long start=System.currentTimeMillis();
for (Path edits: files) {
if (edits == null || !fs.exists(edits)) {
LOG.warn("Null or non-existent edits file: " + edits);
continue;
}
if (isZeroLengthThenDelete(fs, edits)) continue;
long maxSeqId;
String fileName = edits.getName();
maxSeqId = Math.abs(Long.parseLong(fileName));
if (maxSeqId <= minSeqIdForTheRegion) {
if (LOG.isDebugEnabled()) {
String msg = "Maximum sequenceid for this wal is " + maxSeqId
+ " and minimum sequenceid for the region is " + minSeqIdForTheRegion
+ ", skipped the whole file, path=" + edits;
LOG.info(msg);
}
continue;
}
try {
seqid = Math.max(seqid, replayRecoveredEdits(edits, maxSeqIdInStores,
reporter));
// replay the edits. Replay can return -1 if everything is skipped, only update
// if seqId is greater
String msg = "after replayRecoveredEdits Maximum sequenceid " + maxSeqId
+ " and minimum sequenceid for the region is " + minSeqIdForTheRegion
+ ", replay the file, path=" + edits
+",seqid="+seqid+",size="+files.size();
LOG.info(msg);{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)