[
https://issues.apache.org/jira/browse/HDFS-5535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13911274#comment-13911274
]
stack commented on HDFS-5535:
-----------------------------
Few late comments on the design doc:
+ "It has an added benefit of not losing data locality, which is critical for
applications such as HBase."
Thanks for the consideration of clients that keep files open while they are up.
+1 on suggestion that we stall the pipeline rather than have the restarted
replica cause a break in existing pipelines.
+ "For example, MTTR should be less than the default client socket timeout for
successful restarts."
What you folks thinking here? I saw 60 seconds earlier up in the doc. Some
HBase deploys have this ratcheted down to a few seconds or so (For example, see
http://goo.gl/Ue3FPl for where pinterest talk about 3 seconds socket timeout
on read, 5 seconds for write socket timeout and 1 second on ipc w/ retries set
to two). It'd be coolio if we didn't have to rolling restart hbase on top of
an hdfs rolling restart, if they could be done independent of each other
without incurring loss of locality.
+ "3. Clients"
It says "For shutdown, clients may choose to copy partially written replica to
another node..." The DFSClient would do this internally (or 'may' do this)?
Good stuff.
+ "Rollback and downgrade requires cluster downtime and is not done in a
rolling fashion."
+ Downgrade sounds like it will be a load of work (I used to work at a place
where eng. spent 30-40% of its time making sure migrations from one version to
another worked going forward and backwards because someone thought it was a
good idea not realizing I"m sure the cost involved). You for sure want to
support that?
> Umbrella jira for improved HDFS rolling upgrades
> ------------------------------------------------
>
> Key: HDFS-5535
> URL: https://issues.apache.org/jira/browse/HDFS-5535
> Project: Hadoop HDFS
> Issue Type: New Feature
> Components: datanode, ha, hdfs-client, namenode
> Affects Versions: 3.0.0, 2.2.0
> Reporter: Nathan Roberts
> Attachments: HDFSRollingUpgradesHighLevelDesign.pdf,
> h5535_20140219.patch, h5535_20140220-1554.patch, h5535_20140220b.patch,
> h5535_20140221-2031.patch, h5535_20140224-1931.patch
>
>
> In order to roll a new HDFS release through a large cluster quickly and
> safely, a few enhancements are needed in HDFS. An initial High level design
> document will be attached to this jira, and sub-jiras will itemize the
> individual tasks.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)