[
https://issues.apache.org/jira/browse/HDFS-11096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sean Mackrory updated HDFS-11096:
---------------------------------
Attachment: HDFS-11096.006.patch
It's possible, but will be tough.
I worked with [~rchiang] to get past the YARN issues I was having. By
specifying both hostname (required by shell scripts) and the address (hostname
+ ports) for all of the YARN ports, I was able to get it to work. I feel this
is possibly an incompatible change in YARN, being that YARN works fine by just
specifying the hostname (as long as everything's going to use the default
ports) in Hadoop 2.x, but I'll leave that [~rchiang]'s judgement if there's a
good enough reason and we can put some documentation in place. Specifying the
ports in a Hadoop 2.x cluster prior to upgrade wouldn't be too bad.
I then repeatedly encountered a lot of failures due to timeouts with both
ZooKeeper and JournalNodes. I increased a couple of timeouts and was able to
get it working reliably again. Other changes in the revision I'm posting (.006)
right now:
* where it applies to both YARN and HDFS, I've stopped used NAMENODES and
DATANODES, but MASTERS and WORKERS
* I fixed the sole shellcheck issue above. It was not raised locally, so my
version must be out of sync. Can't confirm until Yetus does that I've
eliminated others
* I've added more distcp-over-webhdfs tests: to, from, and on both old and new
clusters.They're all working perfeclt.
Currently the only issue I see is that the ResourceManager port 8032 stops
listening towards the end of the rolling upgrade test. ResourceManager does not
log any problems, and I don't see any other issues. But after we stop all the
loops of MapReduce jobs that were running during the rolling upgrade, we can't
query the job history to confirm they were all successful, because it can't
connect to :8032 on either node. Other ResourceManager services are still
listening. This happens even if I comment out the YARN rolling upgrade step.
I may need to get some more help from [~rchiang] debugging that again. I'm also
going to try running this against branch-3.0 instead of trunk, to eliminate
some instability I may be seeing.
> Support rolling upgrade between 2.x and 3.x
> -------------------------------------------
>
> Key: HDFS-11096
> URL: https://issues.apache.org/jira/browse/HDFS-11096
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: rolling upgrades
> Affects Versions: 3.0.0-alpha1
> Reporter: Andrew Wang
> Assignee: Sean Mackrory
> Priority: Blocker
> Attachments: HDFS-11096.001.patch, HDFS-11096.002.patch,
> HDFS-11096.003.patch, HDFS-11096.004.patch, HDFS-11096.005.patch,
> HDFS-11096.006.patch
>
>
> trunk has a minimum software version of 3.0.0-alpha1. This means we can't
> rolling upgrade between branch-2 and trunk.
> This is a showstopper for large deployments. Unless there are very compelling
> reasons to break compatibility, let's restore the ability to rolling upgrade
> to 3.x releases.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]