[
https://issues.apache.org/jira/browse/MAPREDUCE-6978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16198761#comment-16198761
]
Jason Lowe commented on MAPREDUCE-6978:
---------------------------------------
Sure, there's plenty of ways to fix the problem if we modify the old code, but
backwards-compatibility implies we don't have the luxury of doing that. ;-)
bq. I suggest counters should serialized through protocol buffer.
Yes, switching MapReduce's task umbilical messaging to use protocol buffers
would help in a number of areas regarding compatibility, but protocol buffers
aren't magic. If someone were to add an enumeration value "in the middle" such
that the enumeration numbers no longer line up then protocol buffers would also
confuse the values upon receipt. Both protocol buffers and writables require
enumeration values remain constant across versions. Even if the new value is
added at the end, as intended, what will the older version of the software do
when it receives the unknown value from the new version via protocol buffers?
It still will explode trying to convert it into the enum object because that
ordinal value doesn't exist in the older enum version. The older version of
the software would need some sort of handling to address unknown values.
But again this is just one of many places in the AM-task connection that could
break when moving versions. The simplest solution is to make sure those
versions never mix within a single job, hence the recommendation to move to a
distributed cache deploy for MapReduce. It also has the nice benefit of
allowing users to choose which version they want to run on a per-job basis
(e.g.: try out a new version on a few jobs without needing to force the new
version on every job).
> MR task counters deserialized through RPC throws OutOfBoundsException if
> Counter enum class version not match
> -------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-6978
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6978
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Components: mr-am, task
> Affects Versions: 3.0.0-alpha4
> Environment: NM1 TaskCounter.class old version;
> NM2 TaskCounter.class new version (new Enumeration values appended);
> Reporter: rangjiaheng
>
> Environment:
> NM1 TaskCounter.class old version;
> NM2 TaskCounter.class new version (new Enumeration values appended);
> Result:
> When an MR app's AM running on NM1, and it's containers on NM2; the
> containers on NM2 will all failed, AM cause OutOfBoundsException;
> Reason:
> When app running, containers will report their counters to AM through RPC,
> while the Container with new version TaskCounter.class will write more
> Counter values to RPC; however, the AM with old version TaskCounter.class
> which can not read them correctly from RPC.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]