[
https://issues.apache.org/jira/browse/HDFS-7541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ming Ma updated HDFS-7541:
--------------------------
Attachment: HDFS-7541-3.patch
We have been running upgrade domain policy on one of our large production
clusters, here are the results.
* Not perf impact on write operation, specifically the RPC AddBlock latency
* All blocks have been migrated to the upgrade domain policy.
Here is the updated version of the patch. Appreciate if anyone has any high
level comments on the design. If people are ok with the approach, I will open
sub tasks.
During the work, we also found out that the balancer has hard code rack based
policy, instead of leveraging block placement policy, e.g. HDFS-1431. Something
we should follow up more so that balancer doesn’t need to be modified when we
introduce new block placement policy.
> Upgrade Domains in HDFS
> -----------------------
>
> Key: HDFS-7541
> URL: https://issues.apache.org/jira/browse/HDFS-7541
> Project: Hadoop HDFS
> Issue Type: Improvement
> Reporter: Ming Ma
> Attachments: HDFS-7541-3.patch, HDFS-7541.patch,
> SupportforfastHDFSdatanoderollingupgrade.pdf, UpgradeDomains_design_v2.pdf
>
>
> Current HDFS DN rolling upgrade step requires sequential DN restart to
> minimize the impact on data availability and read/write operations. The side
> effect is longer upgrade duration for large clusters. This might be
> acceptable for DN JVM quick restart to update hadoop code/configuration.
> However, for OS upgrade that requires machine reboot, the overall upgrade
> duration will be too long if we continue to do sequential DN rolling restart.
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)