[
https://issues.apache.org/jira/browse/TEZ-4152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17090713#comment-17090713
]
László Bodor commented on TEZ-4152:
-----------------------------------
seems like it's a bit harder than I thought
my plan was a solution where we don't need to change tez protobuf code when we
upgrade to hadoop 3.3, I was thinking of some wrappers, but I bumped into walls
everywhere, so far
Here are the current constraints:
1. Hadoop's rpc engine will cast the provided message instance to its current
Message class (now it's com.google.protobuf.Message, from hadoop 3.3, it'll be
a org.apache.hadoop.thirdparty.protobuf)
https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/ProtobufRpcEngine.java#L230
2. Considering the exception trace above, let's start from
TezClient.submitDAGSession, which calls a method which implements a generated
interface and expects a specific message subclass: SubmitDAGRequestProto
https://github.com/apache/tez/blob/master/tez-dag/src/main/java/org/apache/tez/dag/api/client/rpc/DAGClientAMProtocolBlockingPBServerImpl.java#L159
SubmitDAGRequestProto if final
So I need an instance which is SubmitDAGRequestProto but cannot be a subclass
of it but at the same time I need an abstraction from Message interface (as I
want is to be easily pluggable)
> Upgrade to protobuf 3.x and take care of relocated protobuf classes
> -------------------------------------------------------------------
>
> Key: TEZ-4152
> URL: https://issues.apache.org/jira/browse/TEZ-4152
> Project: Apache Tez
> Issue Type: Sub-task
> Reporter: László Bodor
> Assignee: László Bodor
> Priority: Major
>
> Jiras under HADOOP-13363 cover the process of protobuf upgrade and relocation
> in Hadoop.
> Tez is on protobuf 2.5, while hadoop 3.3 is on protobuf 3.x.
> Tez usually follows hadoop with dependencies, so a hadoop 3.3 upgrade means a
> protobuf upgrade in tez as well, and an additional relocation-ish step will
> be needed in tez as e.g. hadoop expects protobuf messages with
> org.apache.hadoop.thirdparty.protobuf package, even if it's not exposed to
> public APIs, for example:
> {code}
> java.lang.ClassCastException:
> org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolRPC$SubmitDAGRequestProto
> cannot be cast to org.apache.hadoop.thirdparty.protobuf.Message
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
> at com.sun.proxy.$Proxy11.submitDAG(Unknown Source)
> at org.apache.tez.client.TezClient.submitDAGSession(TezClient.java:706)
> at org.apache.tez.client.TezClient.submitDAG(TezClient.java:593)
> at
> org.apache.tez.dag.app.TestMockDAGAppMaster.testMixedEdgeRouting(TestMockDAGAppMaster.java:392)
> {code}
> Here is the failing line:
> https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/ProtobufRpcEngine.java#L230
> {code}
> final Message theRequest = (Message) args[1];
> {code}
> relocation means full rewrite in binary, so here Message refers to
> org.apache.hadoop.thirdparty.protobuf.Message, but tez supplies a
> non-relocated one, which implements com.google.protobuf.Message (hadoop
> method signature contains Object, so it compiles)
> so even if protobuf 2.5 is fully compatible with 3.x (not checked yet), we
> will need extra effort on tez side to generate protobuf messages which are
> compatible with hadoop relocated messages...as these classes are generated
> from proto files, we cannot and shouldn't hack them by manual java source
> code manipulation
> I'm thinking of a maven profile based approach which can take care of both
> protobuf 3 and relocation compatible protobuf objects from tez side
--
This message was sent by Atlassian Jira
(v8.3.4#803005)