[jira] [Commented] (HADOOP-15059) 3.0 deployment cannot work with old version MR tar ball which break rolling upgrade

Jason Lowe (JIRA) Wed, 29 Nov 2017 07:03:20 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-15059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16270899#comment-16270899
 ]


Jason Lowe commented on HADOOP-15059:
-------------------------------------

Thanks for taking a look, Daryn!

bq. Not to mention we are now stuck supporting two files. If the format changes 
again, do we have to support 3 files?

We would only support each file location as long as that token format is 
supported.  Just like the alternative "bridge releases" approach, we would 
eventually remove support for older token versions (and thus their locations).  
We would not be forced to support umpteen files unless we wanted to support 
umpteen versions of the token format.  The advantage of what I'm calling the 
"bridge releases" approach is that there's only one place where tokens are 
written, no ugly fallback code, etc., but a significant disadvantage is that it 
forces two things:
# a strict upgrade path between releases (i.e.: users must fully update across 
all applications to the bridge release before upgrading further).
# the new token format must be introduced long before it can actually be used

We should not underestimate the burden of a bridge release especially if 
there's only one magical release.  It's not enough to qualify the software 
works on the bridge release, one must verify via other means that every part of 
the application is using the new bridge release jars.  If any part of the 
application bundles its own, older Hadoop jars the app will still work fine on 
the bridge release but will fail when the cluster upgrades.  Therefore it's 
difficult for admins and users to know for sure they're ready to move beyond 
the bridge release safely because knowing the app runs on the bridge release 
isn't sufficient.

The multi-location approach is not pretty at all, but it is much more flexible 
in upgrade paths and ability for newer software to leverage the new format once 
it's introduced.

bq. The whole point of writing the version into the file is to support multiple 
versions.

Yes, and writing that version does enable supporting multiple versions, _but 
only for software that understands what those version numbers mean_.  Adding a 
version number enables newer software to read an older file format, but old 
software will not be able to read a newer file format.  I only see two options 
for supporting older software consuming tokens created by newer software:
# write the new version to a different location and have new software prefer 
the other location with fallback
# or force everyone to use the old format until nobody until the old software 
is no longer supported and then we switch to the new format in the existing 
files.

I see the theoretical problem with a 3.x create->2.x rewrite->3.x read 
scenario, although I suspect it would not happen in practice.  If it does the 
fix is to force the 2.x code to upgrade to 3.x, and that's essentially what 
we're requiring with a bridge release.

I'm completely OK with going with the "bridge release" approach for this if we 
decide it is the right thing to do here.  It could make more sense if indeed 
the new token format isn't providing any additional features being leveraged 
today that cannot be expressed in the old token format, i.e.: no use-cases are 
broken by forcing the old format even in 3.0.  There would be no pressing need 
to move to the new format, so hopefully this would continue for many 3.x 
releases.  That could increase the likelihood that everyone has naturally 
updated to some version that has v1 support before we start forcing v1 support 
on everyone.


> 3.0 deployment cannot work with old version MR tar ball which break rolling 
> upgrade
> -----------------------------------------------------------------------------------
>
>                 Key: HADOOP-15059
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15059
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: security
>            Reporter: Junping Du
>            Assignee: Jason Lowe
>            Priority: Blocker
>         Attachments: HADOOP-15059.001.patch, HADOOP-15059.002.patch, 
> HADOOP-15059.003.patch
>
>
> I tried to deploy 3.0 cluster with 2.9 MR tar ball. The MR job is failed 
> because following error:
> {noformat}
> 2017-11-21 12:42:50,911 INFO [main] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for 
> application appattempt_1511295641738_0003_000001
> 2017-11-21 12:42:51,070 WARN [main] org.apache.hadoop.util.NativeCodeLoader: 
> Unable to load native-hadoop library for your platform... using builtin-java 
> classes where applicable
> 2017-11-21 12:42:51,118 FATAL [main] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster
> java.lang.RuntimeException: Unable to determine current user
>       at 
> org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:254)
>       at 
> org.apache.hadoop.conf.Configuration$Resource.<init>(Configuration.java:220)
>       at 
> org.apache.hadoop.conf.Configuration$Resource.<init>(Configuration.java:212)
>       at 
> org.apache.hadoop.conf.Configuration.addResource(Configuration.java:888)
>       at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1638)
> Caused by: java.io.IOException: Exception reading 
> /tmp/nm-local-dir/usercache/jdu/appcache/application_1511295641738_0003/container_e03_1511295641738_0003_01_000001/container_tokens
>       at 
> org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:208)
>       at 
> org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:907)
>       at 
> org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:820)
>       at 
> org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:689)
>       at 
> org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:252)
>       ... 4 more
> Caused by: java.io.IOException: Unknown version 1 in token storage.
>       at 
> org.apache.hadoop.security.Credentials.readTokenStorageStream(Credentials.java:226)
>       at 
> org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:205)
>       ... 8 more
> 2017-11-21 12:42:51,122 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting 
> with status 1: java.lang.RuntimeException: Unable to determine current user
> {noformat}
> I think it is due to token incompatiblity change between 2.9 and 3.0. As we 
> claim "rolling upgrade" is supported in Hadoop 3, we should fix this before 
> we ship 3.0 otherwise all MR running applications will get stuck during/after 
> upgrade.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-15059) 3.0 deployment cannot work with old version MR tar ball which break rolling upgrade

Reply via email to