[
https://issues.apache.org/jira/browse/HDFS-5535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13913361#comment-13913361
]
Colin Patrick McCabe commented on HDFS-5535:
--------------------------------------------
bq. Stack said: Chatting w/ Colin too, it sound like SSR, if it fails a local
read, it will then retry the local read again after some number of minutes have
elapsed.
Yeah. The {{DomainSocketFactory}} has a blacklist of domain socket paths, but
they expire after 10 minutes.
bq. Kihwal said: If the local DN was added to deadNodes in a DFSInputStream
because it was restarted, we may be able to (asynchronously?) probe and remove
it from deadNodes.
Why not just have a time limit on the blacklist, like 15 minutes? This would
also help with the case where a DN is temporarily overloaded, and gets added to
the blacklist on a long-open file, and never removed. We should do the simple
things first, and then perhaps move on to more complex schemes.
If you really want to get fancy, you could have a separate daemon running on
the DN which would stay up during the duration of the upgrade, and tell clients
who asked what the status of the upgrade was. But that seems like a big
project, when we haven't even done the simple things, like sharing information
about deadNodes between different DFSInputStreams in the same client.
> Umbrella jira for improved HDFS rolling upgrades
> ------------------------------------------------
>
> Key: HDFS-5535
> URL: https://issues.apache.org/jira/browse/HDFS-5535
> Project: Hadoop HDFS
> Issue Type: New Feature
> Components: datanode, ha, hdfs-client, namenode
> Affects Versions: 3.0.0, 2.2.0
> Reporter: Nathan Roberts
> Attachments: HDFSRollingUpgradesHighLevelDesign.pdf,
> h5535_20140219.patch, h5535_20140220-1554.patch, h5535_20140220b.patch,
> h5535_20140221-2031.patch, h5535_20140224-1931.patch,
> h5535_20140225-1225.patch
>
>
> In order to roll a new HDFS release through a large cluster quickly and
> safely, a few enhancements are needed in HDFS. An initial High level design
> document will be attached to this jira, and sub-jiras will itemize the
> individual tasks.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)