Yizhou Wang created HBASE-27447:
-----------------------------------
Summary: hlog loss result in data loss when regionserver down
network card
Key: HBASE-27447
URL: https://issues.apache.org/jira/browse/HBASE-27447
Project: HBase
Issue Type: Bug
Components: master, wal
Affects Versions: 2.2.7
Reporter: Yizhou Wang
When tested hbase replication, I found that the data in the memory was lost.
Through the source code of hbase, I found that after hbase split the meta
table, it will delete the entire wal directory , but hlog is still in the same
directory, and which save the data in the memstore. Eventually cause data loss.
The specific process is as follows:
# put a few data into hbase table.
# turn down the network card of all regionserver nodes.
# turn up the network card of all regionserver nodes,regionserver will be
killed.
# restart the hbase cluster, scan table and find no data in the table.
In hmaster log will print:
master.MasterWalManager: Log dir for server xxx does not exist
The splitLogDistributed function in SplitLogManager.java caused this issue.
hmaster will first call splitLogDistributed function to publish the split task
of the meta table. After the meta split task is completed, the directory will
be deleted. Then hmaster want to publish another normal split task to restore
the data again, but no wal directory was found.
{code:java}
waitForSplittingCompletion(batch, status);
...
for (Path logDir : logDirs) {
status.setStatus("Cleaning up log directory...");
final FileSystem fs = logDir.getFileSystem(conf);
try {
if (fs.exists(logDir) && !fs.delete(logDir, false)) {
LOG.warn("Unable to delete log src dir. Ignoring. " + logDir);
}
} catch (IOException ioe) {
FileStatus[] files = fs.listStatus(logDir);
if (files != null && files.length > 0) {
LOG.warn("Returning success without actually splitting and "
+ "deleting all the log files in path " + logDir + ": "
+ Arrays.toString(files), ioe);
} else {
LOG.warn("Unable to delete log src dir. Ignoring. " + logDir, ioe);
}
}
}{code}
My English is poor, so if there is anything unclear, can leave me a message.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)