[
https://issues.apache.org/jira/browse/HDFS-7645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14286128#comment-14286128
]
Arpit Agarwal edited comment on HDFS-7645 at 1/21/15 7:34 PM:
--------------------------------------------------------------
The first restore was by design when the rolling upgrade feature was added
(HDFS-6005). It simplified the rollback procedure by not requiring the
{{-rollback}} flag to the DataNode, so regular startup/rollback could be
treated similarly by restoring from trash.
HDFS-6800 added back the requirement to pass the {{-rollback}} flag during RU
rollback, to support layout changes. The second restore was a side effect of
the same fix. We can probably eliminate both restores now.
bq. I think we should get rid of trash and just always create a previous/
directory when doing rolling upgrade, the same as we do with regular upgrade.
The speed is clearly acceptable since we've done these upgrades in the field
when switching to the blockid-based layout with no problems. And it will be a
lot more maintainable and less confusing.
DN layout changes will be rare for minor/point releases. I am wary of
eliminating trash without some numbers showing hard link performance with
millions of blocks is on par with trash. Even a few seconds per DN adds up to
many hours/days when upgrading thousands of DNs sequentially. Once we fix this
issue raised by Nathan the overhead of trash as compared to regular startup is
nil.
was (Author: arpitagarwal):
The first restore was by design when the rolling upgrade feature was added
(HDFS-6005). It simplified the rollback procedure by not requiring the
{{-rollback}} flag to the DataNode, so regular startup/rollback could be
treated similarly by restoring from trash.
HDFS-6800 added back the requirement to pass the {{-rollback}} flag during RU
rollback, to support layout changes. The second restore was a side effect of
the same fix. We can probably eliminate both restores now.
DN layout changes will be rare for minor/point releases. I am wary of
eliminating trash without some numbers showing hard link performance with
millions of blocks is on par with trash. Even a few seconds per DN adds up to
many hours/days when upgrading thousands of DNs sequentially. Once we fix this
issue raised by Nathan the overhead of trash as compared to regular startup is
nil.
> Rolling upgrade is restoring blocks from trash multiple times
> -------------------------------------------------------------
>
> Key: HDFS-7645
> URL: https://issues.apache.org/jira/browse/HDFS-7645
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: datanode
> Affects Versions: 2.6.0
> Reporter: Nathan Roberts
>
> When performing an HDFS rolling upgrade, the trash directory is getting
> restored twice when under normal circumstances it shouldn't need to be
> restored at all. iiuc, the only time these blocks should be restored is if we
> need to rollback a rolling upgrade.
> On a busy cluster, this can cause significant and unnecessary block churn
> both on the datanodes, and more importantly in the namenode.
> The two times this happens are:
> 1) restart of DN onto new software
> {code}
> private void doTransition(DataNode datanode, StorageDirectory sd,
> NamespaceInfo nsInfo, StartupOption startOpt) throws IOException {
> if (startOpt == StartupOption.ROLLBACK && sd.getPreviousDir().exists()) {
> Preconditions.checkState(!getTrashRootDir(sd).exists(),
> sd.getPreviousDir() + " and " + getTrashRootDir(sd) + " should not
> " +
> " both be present.");
> doRollback(sd, nsInfo); // rollback if applicable
> } else {
> // Restore all the files in the trash. The restored files are retained
> // during rolling upgrade rollback. They are deleted during rolling
> // upgrade downgrade.
> int restored = restoreBlockFilesFromTrash(getTrashRootDir(sd));
> LOG.info("Restored " + restored + " block files from trash.");
> }
> {code}
> 2) When heartbeat response no longer indicates a rollingupgrade is in progress
> {code}
> /**
> * Signal the current rolling upgrade status as indicated by the NN.
> * @param inProgress true if a rolling upgrade is in progress
> */
> void signalRollingUpgrade(boolean inProgress) throws IOException {
> String bpid = getBlockPoolId();
> if (inProgress) {
> dn.getFSDataset().enableTrash(bpid);
> dn.getFSDataset().setRollingUpgradeMarker(bpid);
> } else {
> dn.getFSDataset().restoreTrash(bpid);
> dn.getFSDataset().clearRollingUpgradeMarker(bpid);
> }
> }
> {code}
> HDFS-6800 and HDFS-6981 were modifying this behavior making it not completely
> clear whether this is somehow intentional.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)