[jira] [Updated] (HDFS-11096) Support rolling upgrade between 2.x and 3.x

Sean Mackrory (JIRA) Fri, 20 Oct 2017 13:58:53 -0700

     [ 
https://issues.apache.org/jira/browse/HDFS-11096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sean Mackrory updated HDFS-11096:
---------------------------------
    Attachment: HDFS-11096.006.patch

It's possible, but will be tough.

I worked with [~rchiang] to get past the YARN issues I was having. By 
specifying both hostname (required by shell scripts) and the address (hostname 
+ ports) for all of the YARN ports, I was able to get it to work. I feel this 
is possibly an incompatible change in YARN, being that YARN works fine by just 
specifying the hostname (as long as everything's going to use the default 
ports) in Hadoop 2.x, but I'll leave that [~rchiang]'s judgement if there's a 
good enough reason and we can put some documentation in place. Specifying the 
ports in a Hadoop 2.x cluster prior to upgrade wouldn't be too bad.

I then repeatedly encountered a lot of failures due to timeouts with both 
ZooKeeper and JournalNodes. I increased a couple of timeouts and was able to 
get it working reliably again. Other changes in the revision I'm posting (.006) 
right now:

* where it applies to both YARN and HDFS, I've stopped used NAMENODES and 
DATANODES, but MASTERS and WORKERS
* I fixed the sole shellcheck issue above. It was not raised locally, so my 
version must be out of sync. Can't confirm until Yetus does that I've 
eliminated others
* I've added more distcp-over-webhdfs tests: to, from, and on both old and new 
clusters.They're all working perfeclt.
 
Currently the only issue I see is that the ResourceManager port 8032 stops 
listening towards the end of the rolling upgrade test. ResourceManager does not 
log any problems, and I don't see any other issues. But after we stop all the 
loops of MapReduce jobs that were running during the rolling upgrade, we can't 
query the job history to confirm they were all successful, because it can't 
connect to :8032 on either node. Other ResourceManager services are still 
listening. This happens even if I comment out the YARN rolling upgrade step.

I may need to get some more help from [~rchiang] debugging that again. I'm also 
going to try running this against branch-3.0 instead of trunk, to eliminate 
some instability I may be seeing.

> Support rolling upgrade between 2.x and 3.x
> -------------------------------------------
>
>                 Key: HDFS-11096
>                 URL: https://issues.apache.org/jira/browse/HDFS-11096
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: rolling upgrades
>    Affects Versions: 3.0.0-alpha1
>            Reporter: Andrew Wang
>            Assignee: Sean Mackrory
>            Priority: Blocker
>         Attachments: HDFS-11096.001.patch, HDFS-11096.002.patch, 
> HDFS-11096.003.patch, HDFS-11096.004.patch, HDFS-11096.005.patch, 
> HDFS-11096.006.patch
>
>
> trunk has a minimum software version of 3.0.0-alpha1. This means we can't 
> rolling upgrade between branch-2 and trunk.
> This is a showstopper for large deployments. Unless there are very compelling 
> reasons to break compatibility, let's restore the ability to rolling upgrade 
> to 3.x releases.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (HDFS-11096) Support rolling upgrade between 2.x and 3.x

Reply via email to