[
https://issues.apache.org/jira/browse/TEZ-4152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
László Bodor updated TEZ-4152:
------------------------------
Description:
Jiras under HADOOP-13363 cover the process of protobuf upgrade and relocation
in Hadoop.
Tez is on protobuf 2.5, while hadoop 3.3 is on protobuf 3.x.
Tez usually follows hadoop with dependencies, so a hadoop 3.3 upgrade means a
protobuf upgrade in tez as well, and an additional relocation-ish step will be
needed in tez as e.g. hadoop expects protobuf messages with
org.apache.hadoop.thirdparty.protobuf package, even if it's not exposed to
public APIs, for example:
{code}
java.lang.ClassCastException:
org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolRPC$SubmitDAGRequestProto
cannot be cast to org.apache.hadoop.thirdparty.protobuf.Message
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
at com.sun.proxy.$Proxy11.submitDAG(Unknown Source)
at org.apache.tez.client.TezClient.submitDAGSession(TezClient.java:706)
at org.apache.tez.client.TezClient.submitDAG(TezClient.java:593)
at
org.apache.tez.dag.app.TestMockDAGAppMaster.testMixedEdgeRouting(TestMockDAGAppMaster.java:392)
{code}
Here is the failing line:
https://github.com/apache/hadoop/blob/e103c83765898f756f88c27b2243c8dd3098a989/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/ProtobufRpcEngine.java#L232
{code}
final Message theRequest = (Message) args[1];
{code}
relocation means full rewrite in binary, so here Message refers to
org.apache.hadoop.thirdparty.protobuf.Message, but tez supplies a non-relocated
one, which implements com.google.protobuf.Message (hadoop method signature
contains Object, so it compiles)
so even if protobuf 2.5 is fully compatible with 3.x (not checked yet), we will
need extra effort on tez side to generate protobuf messages which are
compatible with hadoop relocated messages...as these classes are generated from
proto files, we cannot and shouldn't hack them by manual java source code
manipulation
I'm thinking of a maven profile based approach which can take care of both
protobuf 3 and relocation compatible protobuf objects from tez side
was:
Jiras under HADOOP-13363 cover the process of protobuf upgrade and relocation
in Hadoop.
Tez is on protobuf 2.5, while hadoop 3.3 is on protobuf 3.x.
Tez usually follows hadoop with dependencies, so a hadoop 3.3 upgrade means a
protobuf upgrade in tez as well, and an additional relocation-ish step will be
needed in tez as e.g. hadoop expects protobuf messages with
org.apache.hadoop.thirdparty.protobuf package, even if it's not exposed to
public APIs, for example:
{code}
java.lang.ClassCastException:
org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolRPC$SubmitDAGRequestProto
cannot be cast to org.apache.hadoop.thirdparty.protobuf.Message
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
at com.sun.proxy.$Proxy11.submitDAG(Unknown Source)
at org.apache.tez.client.TezClient.submitDAGSession(TezClient.java:706)
at org.apache.tez.client.TezClient.submitDAG(TezClient.java:593)
at
org.apache.tez.dag.app.TestMockDAGAppMaster.testMixedEdgeRouting(TestMockDAGAppMaster.java:392)
{code}
Here is the failing line:
https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/ProtobufRpcEngine.java#L237
{code}
final Message theRequest = (Message) args[1];
{code}
relocation means full rewrite in binary, so here Message refers to
org.apache.hadoop.thirdparty.protobuf.Message, but tez supplies a non-relocated
one, which implements com.google.protobuf.Message (hadoop method signature
contains Object, so it compiles)
so even if protobuf 2.5 is fully compatible with 3.x (not checked yet), we will
need extra effort on tez side to generate protobuf messages which are
compatible with hadoop relocated messages...as these classes are generated from
proto files, we cannot and shouldn't hack them by manual java source code
manipulation
I'm thinking of a maven profile based approach which can take care of both
protobuf 3 and relocation compatible protobuf objects from tez side
> Upgrade to protobuf 3.x and take care of relocated protobuf classes
> -------------------------------------------------------------------
>
> Key: TEZ-4152
> URL: https://issues.apache.org/jira/browse/TEZ-4152
> Project: Apache Tez
> Issue Type: Sub-task
> Reporter: László Bodor
> Assignee: László Bodor
> Priority: Major
>
> Jiras under HADOOP-13363 cover the process of protobuf upgrade and relocation
> in Hadoop.
> Tez is on protobuf 2.5, while hadoop 3.3 is on protobuf 3.x.
> Tez usually follows hadoop with dependencies, so a hadoop 3.3 upgrade means a
> protobuf upgrade in tez as well, and an additional relocation-ish step will
> be needed in tez as e.g. hadoop expects protobuf messages with
> org.apache.hadoop.thirdparty.protobuf package, even if it's not exposed to
> public APIs, for example:
> {code}
> java.lang.ClassCastException:
> org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolRPC$SubmitDAGRequestProto
> cannot be cast to org.apache.hadoop.thirdparty.protobuf.Message
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
> at com.sun.proxy.$Proxy11.submitDAG(Unknown Source)
> at org.apache.tez.client.TezClient.submitDAGSession(TezClient.java:706)
> at org.apache.tez.client.TezClient.submitDAG(TezClient.java:593)
> at
> org.apache.tez.dag.app.TestMockDAGAppMaster.testMixedEdgeRouting(TestMockDAGAppMaster.java:392)
> {code}
> Here is the failing line:
> https://github.com/apache/hadoop/blob/e103c83765898f756f88c27b2243c8dd3098a989/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/ProtobufRpcEngine.java#L232
> {code}
> final Message theRequest = (Message) args[1];
> {code}
> relocation means full rewrite in binary, so here Message refers to
> org.apache.hadoop.thirdparty.protobuf.Message, but tez supplies a
> non-relocated one, which implements com.google.protobuf.Message (hadoop
> method signature contains Object, so it compiles)
> so even if protobuf 2.5 is fully compatible with 3.x (not checked yet), we
> will need extra effort on tez side to generate protobuf messages which are
> compatible with hadoop relocated messages...as these classes are generated
> from proto files, we cannot and shouldn't hack them by manual java source
> code manipulation
> I'm thinking of a maven profile based approach which can take care of both
> protobuf 3 and relocation compatible protobuf objects from tez side
--
This message was sent by Atlassian Jira
(v8.3.4#803005)