[ 
https://issues.apache.org/jira/browse/TEZ-4152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17090713#comment-17090713
 ] 

László Bodor edited comment on TEZ-4152 at 4/23/20, 4:35 PM:
-------------------------------------------------------------

seems like it's a bit harder than I first thought
my plan was a solution where we don't need to change tez protobuf code when we 
upgrade to hadoop 3.3, I was thinking of some wrappers, but I bumped into walls 
everywhere, so far

Here are the current constraints:
1. Hadoop's rpc engine will cast the provided message instance to its current 
Message class (now it's com.google.protobuf.Message, from hadoop 3.3, it'll be 
a org.apache.hadoop.thirdparty.protobuf)
https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/ProtobufRpcEngine.java#L230

2. Considering the exception trace above, let's start from 
TezClient.submitDAGSession, which calls a method which implements a generated 
interface and expects a specific message subclass: SubmitDAGRequestProto
https://github.com/apache/tez/blob/master/tez-dag/src/main/java/org/apache/tez/dag/api/client/rpc/DAGClientAMProtocolBlockingPBServerImpl.java#L159
SubmitDAGRequestProto is final

So I need an instance which is SubmitDAGRequestProto but cannot be a subclass 
of it (it's final) but at the same time I need an abstraction from Message 
interface (as I want is to be easily pluggable) 

1. if I simply wrap original SubmitDAGRequestProto into a <? extends Message>, 
which esentially a superclass of SubmitDAGRequestProto, I would solve the 
hadoop adaptation (as the wrapper can be changed easily for the whole project), 
but I cannot cast it back to SubmitDAGRequestProto
2. I cannot even use a subclass of SubmitDAGRequestProto for wrapping it, 
becuase SubmitDAGRequestProto is generated as final by protobuf by design

cc: [~ashutoshc], [~jeagles]
or we can do hadoop 3.3 upgrade with a breaking change, where we simply adapt 
to its new thirdparty protobuf package
I still think there is a clean, non-breaking way of creating some abstraction 
layer between hadoop and tez, but I haven't figured out

(I was thinking about shading as always, but in this case shading itself is 
kind of orthogonal to the problems mentioned above)


was (Author: abstractdog):
seems like it's a bit harder than I first thought
my plan was a solution where we don't need to change tez protobuf code when we 
upgrade to hadoop 3.3, I was thinking of some wrappers, but I bumped into walls 
everywhere, so far

Here are the current constraints:
1. Hadoop's rpc engine will cast the provided message instance to its current 
Message class (now it's com.google.protobuf.Message, from hadoop 3.3, it'll be 
a org.apache.hadoop.thirdparty.protobuf)
https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/ProtobufRpcEngine.java#L230

2. Considering the exception trace above, let's start from 
TezClient.submitDAGSession, which calls a method which implements a generated 
interface and expects a specific message subclass: SubmitDAGRequestProto
https://github.com/apache/tez/blob/master/tez-dag/src/main/java/org/apache/tez/dag/api/client/rpc/DAGClientAMProtocolBlockingPBServerImpl.java#L159
SubmitDAGRequestProto is final

So I need an instance which is SubmitDAGRequestProto but cannot be a subclass 
of it (it's final) but at the same time I need an abstraction from Message 
interface (as I want is to be easily pluggable) 

1. if I simply wrap original SubmitDAGRequestProto into a <? extends Message>, 
which esentially a superclass of SubmitDAGRequestProto, I would solve the 
hadoop adaptation (as the wrapper can be changed easily for the whole project), 
but I cannot cast it back to SubmitDAGRequestProto
2. I cannot even use a subclass of SubmitDAGRequestProto for wrapping it, 
becuase SubmitDAGRequestProto is generated as final by protobuf by design

cc: [~ashutoshc], [~jeagles]
or we can do hadoop 3.3 upgrade with a breaking change, where we simply adapt 
to its new thirdparty protobuf package

> Upgrade to protobuf 3.x and take care of relocated protobuf classes
> -------------------------------------------------------------------
>
>                 Key: TEZ-4152
>                 URL: https://issues.apache.org/jira/browse/TEZ-4152
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: László Bodor
>            Assignee: László Bodor
>            Priority: Major
>
> Jiras under HADOOP-13363 cover the process of protobuf upgrade and relocation 
> in Hadoop.
> Tez is on protobuf 2.5, while hadoop 3.3 is on protobuf 3.x.
> Tez usually follows hadoop with dependencies, so a hadoop 3.3 upgrade means a 
> protobuf upgrade in tez as well, and an additional relocation-ish step will 
> be needed in tez as e.g. hadoop expects protobuf messages with 
> org.apache.hadoop.thirdparty.protobuf package, even if it's not exposed to 
> public APIs, for example:
> {code}
> java.lang.ClassCastException: 
> org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolRPC$SubmitDAGRequestProto
>  cannot be cast to org.apache.hadoop.thirdparty.protobuf.Message
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
>       at com.sun.proxy.$Proxy11.submitDAG(Unknown Source)
>       at org.apache.tez.client.TezClient.submitDAGSession(TezClient.java:706)
>       at org.apache.tez.client.TezClient.submitDAG(TezClient.java:593)
>       at 
> org.apache.tez.dag.app.TestMockDAGAppMaster.testMixedEdgeRouting(TestMockDAGAppMaster.java:392)
> {code}
> Here is the failing line:
> https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/ProtobufRpcEngine.java#L230
> {code}
> final Message theRequest = (Message) args[1];
> {code}
> relocation means full rewrite in binary, so here Message refers to 
> org.apache.hadoop.thirdparty.protobuf.Message, but tez supplies a 
> non-relocated one, which implements com.google.protobuf.Message (hadoop 
> method signature contains Object, so it compiles)
> so even if protobuf 2.5 is fully compatible with 3.x (not checked yet), we 
> will need extra effort on tez side to generate protobuf messages which are 
> compatible with hadoop relocated messages...as these classes are generated 
> from proto files, we cannot and shouldn't hack them by manual java source 
> code manipulation
> I'm thinking of a maven profile based approach which can take care of both 
> protobuf 3 and relocation compatible protobuf objects from tez side



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to