[ 
https://issues.apache.org/jira/browse/HDFS-5535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13911274#comment-13911274
 ] 

stack commented on HDFS-5535:
-----------------------------

Few late comments on the design doc:

+ "It has an added benefit of not losing data locality, which is critical for 
applications such as HBase."

Thanks for the consideration of clients that keep files open while they are up. 
 +1 on suggestion that we stall the pipeline rather than have the restarted 
replica cause a break in existing pipelines.

+ "For example, MTTR should be less than the default client socket timeout for 
successful restarts."

What you folks thinking here?  I saw 60 seconds earlier up in the doc.  Some 
HBase deploys have this ratcheted down to a few seconds or so (For example, see 
 http://goo.gl/Ue3FPl for where pinterest talk about 3 seconds socket timeout 
on read, 5 seconds for write socket timeout and 1 second on ipc w/ retries set 
to two).  It'd be coolio if we didn't have to rolling restart hbase on top of 
an hdfs rolling restart, if they could be done independent of each other 
without incurring loss of locality.

+ "3. Clients"

It says "For shutdown, clients may choose to copy partially written replica to 
another node..." The DFSClient would do this internally (or 'may' do this)?

Good stuff.





+ "Rollback and downgrade requires cluster downtime and is not done in a 
rolling fashion."



+ Downgrade sounds like it will be a load of work (I used to work at a place 
where eng. spent 30-40% of its time making sure migrations from one version to 
another worked going forward and backwards because someone thought it was a 
good idea not realizing I"m sure the cost involved).  You for sure want to 
support that?

> Umbrella jira for improved HDFS rolling upgrades
> ------------------------------------------------
>
>                 Key: HDFS-5535
>                 URL: https://issues.apache.org/jira/browse/HDFS-5535
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: datanode, ha, hdfs-client, namenode
>    Affects Versions: 3.0.0, 2.2.0
>            Reporter: Nathan Roberts
>         Attachments: HDFSRollingUpgradesHighLevelDesign.pdf, 
> h5535_20140219.patch, h5535_20140220-1554.patch, h5535_20140220b.patch, 
> h5535_20140221-2031.patch, h5535_20140224-1931.patch
>
>
> In order to roll a new HDFS release through a large cluster quickly and 
> safely, a few enhancements are needed in HDFS. An initial High level design 
> document will be attached to this jira, and sub-jiras will itemize the 
> individual tasks.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to