Wei Yan created HDFS-12800:
------------------------------
Summary: Potential disk/block missing when DataNode upgrade with
data layout changed
Key: HDFS-12800
URL: https://issues.apache.org/jira/browse/HDFS-12800
Project: Hadoop HDFS
Issue Type: Bug
Reporter: Wei Yan
Assignee: Wei Yan
During upgrade with a data layout change, we found some disks are not formatted
as new layout version, causing some blocks are missing. The root cause is
because of race conflict in the doUpgrade process.
In current DataStorage.java's loadBlockPoolSliceStorage implementation, for
each datadir, it will restore trash, generate upgrade task, and execute these
tasks at the end of each datadir for-loop.
{code}
for (StorageLocation dataDir : dataDirs) {
dataDir.makeBlockPoolDir(bpid, null);
try {
final List<Callable<StorageDirectory>> callables = Lists.newArrayList();
final List<StorageDirectory> dirs = bpStorage.recoverTransitionRead(
nsInfo, dataDir, startOpt, callables, datanode.getConf());
if (callables.isEmpty()) {
......
} else {
for(Callable<StorageDirectory> c : callables) {
tasks.add(new UpgradeTask(dataDir, executor.submit(c)));
}
}
} catch (IOException e) {
......
}
}
{code}
Inside the doUpgrade task, it will actually update the layoutVersion variable.
{code}
this.layoutVersion = HdfsServerConstants.DATANODE_LAYOUT_VERSION;
{code}
This will break the upgrade task generation for other datadirs
(BlockPoolSliceStorage.java). The 2nd if condition will fail, causing some
disks are not added to the upgrade task lists. As a results, only part of disks
are upgraded to the new layout format, and few are not. Restarting DataNodes
will reduce the missing number.
{code}
if (this.layoutVersion > HdfsServerConstants.DATANODE_LAYOUT_VERSION) {
int restored = restoreBlockFilesFromTrash(getTrashRootDir(sd));
LOG.info("Restored " + restored + " block files from trash " +
"before the layout upgrade. These blocks will be moved to " +
"the previous directory during the upgrade");
}
if (this.layoutVersion > HdfsServerConstants.DATANODE_LAYOUT_VERSION
|| this.cTime < nsInfo.getCTime()) {
doUpgrade(sd, nsInfo, callables, conf); // upgrade
return true;
}
{code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]