[ 
https://issues.apache.org/jira/browse/TEZ-4152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4152:
------------------------------
    Description: 
Jiras under HADOOP-13363 cover the process of protobuf upgrade and relocation 
in Hadoop.

Tez is on protobuf 2.5, while hadoop 3.3 is on protobuf 3.x.
Tez usually follows hadoop with dependencies, so a hadoop 3.3 upgrade means a 
protobuf upgrade in tez as well, and an additional relocation-ish step will be 
needed in tez as e.g. hadoop expects protobuf messages with 
org.apache.hadoop.thirdparty.protobuf package, even if it's not exposed to 
public APIs, for example:
{code}
java.lang.ClassCastException: 
org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolRPC$SubmitDAGRequestProto 
cannot be cast to org.apache.hadoop.thirdparty.protobuf.Message
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
        at com.sun.proxy.$Proxy11.submitDAG(Unknown Source)
        at org.apache.tez.client.TezClient.submitDAGSession(TezClient.java:706)
        at org.apache.tez.client.TezClient.submitDAG(TezClient.java:593)
        at 
org.apache.tez.dag.app.TestMockDAGAppMaster.testMixedEdgeRouting(TestMockDAGAppMaster.java:392)
{code}

Here is the failing line:
https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/ProtobufRpcEngine.java#L230
{code}
final Message theRequest = (Message) args[1];
{code}
relocation means full rewrite in binary, so here Message refers to 
org.apache.hadoop.thirdparty.protobuf.Message, but tez supplies a non-relocated 
one, which implements com.google.protobuf.Message (hadoop method signature 
contains Object, so it compiles)
so even if protobuf 2.5 is fully compatible with 3.x (not checked yet), we will 
need extra effort on tez side to generate protobuf messages which are 
compatible with hadoop relocated messages...as these classes are generated from 
proto files, we cannot and shouldn't hack them by manual java source code 
manipulation

I'm thinking of a maven profile based approach which can take care of both 
protobuf 3 and relocation compatible protobuf objects from tez side

> Upgrade to protobuf 3.x and take care of relocated protobuf classes
> -------------------------------------------------------------------
>
>                 Key: TEZ-4152
>                 URL: https://issues.apache.org/jira/browse/TEZ-4152
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: László Bodor
>            Assignee: László Bodor
>            Priority: Major
>
> Jiras under HADOOP-13363 cover the process of protobuf upgrade and relocation 
> in Hadoop.
> Tez is on protobuf 2.5, while hadoop 3.3 is on protobuf 3.x.
> Tez usually follows hadoop with dependencies, so a hadoop 3.3 upgrade means a 
> protobuf upgrade in tez as well, and an additional relocation-ish step will 
> be needed in tez as e.g. hadoop expects protobuf messages with 
> org.apache.hadoop.thirdparty.protobuf package, even if it's not exposed to 
> public APIs, for example:
> {code}
> java.lang.ClassCastException: 
> org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolRPC$SubmitDAGRequestProto
>  cannot be cast to org.apache.hadoop.thirdparty.protobuf.Message
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
>       at com.sun.proxy.$Proxy11.submitDAG(Unknown Source)
>       at org.apache.tez.client.TezClient.submitDAGSession(TezClient.java:706)
>       at org.apache.tez.client.TezClient.submitDAG(TezClient.java:593)
>       at 
> org.apache.tez.dag.app.TestMockDAGAppMaster.testMixedEdgeRouting(TestMockDAGAppMaster.java:392)
> {code}
> Here is the failing line:
> https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/ProtobufRpcEngine.java#L230
> {code}
> final Message theRequest = (Message) args[1];
> {code}
> relocation means full rewrite in binary, so here Message refers to 
> org.apache.hadoop.thirdparty.protobuf.Message, but tez supplies a 
> non-relocated one, which implements com.google.protobuf.Message (hadoop 
> method signature contains Object, so it compiles)
> so even if protobuf 2.5 is fully compatible with 3.x (not checked yet), we 
> will need extra effort on tez side to generate protobuf messages which are 
> compatible with hadoop relocated messages...as these classes are generated 
> from proto files, we cannot and shouldn't hack them by manual java source 
> code manipulation
> I'm thinking of a maven profile based approach which can take care of both 
> protobuf 3 and relocation compatible protobuf objects from tez side



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to