[ https://issues.apache.org/jira/browse/TEZ-4152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17090713#comment-17090713 ]
László Bodor edited comment on TEZ-4152 at 4/27/20, 4:15 PM: ------------------------------------------------------------- UPDATE: feel free to skip this comment, now it seems like I'm proceeding in TEZ-4154 seems like it's a bit harder than I first thought my plan was a solution where we don't need to change tez protobuf code when we upgrade to hadoop 3.3, I was thinking of some wrappers, but I bumped into walls everywhere, so far Here are the current constraints: 1. Hadoop's rpc engine will cast the provided message instance to its current Message class (now it's com.google.protobuf.Message, from hadoop 3.3, it'll be a org.apache.hadoop.thirdparty.protobuf) https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/ProtobufRpcEngine.java#L230 2. Considering the exception trace above, let's start from TezClient.submitDAGSession, which calls a method which implements a generated interface and expects a specific message subclass: SubmitDAGRequestProto https://github.com/apache/tez/blob/master/tez-dag/src/main/java/org/apache/tez/dag/api/client/rpc/DAGClientAMProtocolBlockingPBServerImpl.java#L159 SubmitDAGRequestProto is final So I need an instance which is SubmitDAGRequestProto but cannot be a subclass of it (it's final) but at the same time I need an abstraction from Message interface (as I want is to be easily pluggable) 1. if I simply wrap original SubmitDAGRequestProto into a <? extends Message>, which esentially a superclass of SubmitDAGRequestProto, I would solve the hadoop adaptation (as the wrapper can be changed easily for the whole project), but I cannot cast it back to SubmitDAGRequestProto 2. I cannot even use a subclass of SubmitDAGRequestProto for wrapping it, becuase SubmitDAGRequestProto is generated as final by protobuf by design cc: [~ashutoshc], [~jeagles] or we can do hadoop 3.3 upgrade with a breaking change, where we simply adapt to its new thirdparty protobuf package I still think there is a clean, non-breaking way of creating some abstraction layer between hadoop and tez, but I haven't figured out (I was thinking about shading as always, but in this case shading itself is kind of orthogonal to the problems mentioned above) was (Author: abstractdog): seems like it's a bit harder than I first thought my plan was a solution where we don't need to change tez protobuf code when we upgrade to hadoop 3.3, I was thinking of some wrappers, but I bumped into walls everywhere, so far Here are the current constraints: 1. Hadoop's rpc engine will cast the provided message instance to its current Message class (now it's com.google.protobuf.Message, from hadoop 3.3, it'll be a org.apache.hadoop.thirdparty.protobuf) https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/ProtobufRpcEngine.java#L230 2. Considering the exception trace above, let's start from TezClient.submitDAGSession, which calls a method which implements a generated interface and expects a specific message subclass: SubmitDAGRequestProto https://github.com/apache/tez/blob/master/tez-dag/src/main/java/org/apache/tez/dag/api/client/rpc/DAGClientAMProtocolBlockingPBServerImpl.java#L159 SubmitDAGRequestProto is final So I need an instance which is SubmitDAGRequestProto but cannot be a subclass of it (it's final) but at the same time I need an abstraction from Message interface (as I want is to be easily pluggable) 1. if I simply wrap original SubmitDAGRequestProto into a <? extends Message>, which esentially a superclass of SubmitDAGRequestProto, I would solve the hadoop adaptation (as the wrapper can be changed easily for the whole project), but I cannot cast it back to SubmitDAGRequestProto 2. I cannot even use a subclass of SubmitDAGRequestProto for wrapping it, becuase SubmitDAGRequestProto is generated as final by protobuf by design cc: [~ashutoshc], [~jeagles] or we can do hadoop 3.3 upgrade with a breaking change, where we simply adapt to its new thirdparty protobuf package I still think there is a clean, non-breaking way of creating some abstraction layer between hadoop and tez, but I haven't figured out (I was thinking about shading as always, but in this case shading itself is kind of orthogonal to the problems mentioned above) > Upgrade to protobuf 3.x and take care of relocated protobuf classes > ------------------------------------------------------------------- > > Key: TEZ-4152 > URL: https://issues.apache.org/jira/browse/TEZ-4152 > Project: Apache Tez > Issue Type: Sub-task > Reporter: László Bodor > Assignee: László Bodor > Priority: Major > > Jiras under HADOOP-13363 cover the process of protobuf upgrade and relocation > in Hadoop. > Tez is on protobuf 2.5, while hadoop 3.3 is on protobuf 3.x. > Tez usually follows hadoop with dependencies, so a hadoop 3.3 upgrade means a > protobuf upgrade in tez as well, and an additional relocation-ish step will > be needed in tez as e.g. hadoop expects protobuf messages with > org.apache.hadoop.thirdparty.protobuf package, even if it's not exposed to > public APIs, for example: > {code} > java.lang.ClassCastException: > org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolRPC$SubmitDAGRequestProto > cannot be cast to org.apache.hadoop.thirdparty.protobuf.Message > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118) > at com.sun.proxy.$Proxy11.submitDAG(Unknown Source) > at org.apache.tez.client.TezClient.submitDAGSession(TezClient.java:706) > at org.apache.tez.client.TezClient.submitDAG(TezClient.java:593) > at > org.apache.tez.dag.app.TestMockDAGAppMaster.testMixedEdgeRouting(TestMockDAGAppMaster.java:392) > {code} > Here is the failing line: > https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/ProtobufRpcEngine.java#L230 > {code} > final Message theRequest = (Message) args[1]; > {code} > relocation means full rewrite in binary, so here Message refers to > org.apache.hadoop.thirdparty.protobuf.Message, but tez supplies a > non-relocated one, which implements com.google.protobuf.Message (hadoop > method signature contains Object, so it compiles) > so even if protobuf 2.5 is fully compatible with 3.x (not checked yet), we > will need extra effort on tez side to generate protobuf messages which are > compatible with hadoop relocated messages...as these classes are generated > from proto files, we cannot and shouldn't hack them by manual java source > code manipulation > I'm thinking of a maven profile based approach which can take care of both > protobuf 3 and relocation compatible protobuf objects from tez side -- This message was sent by Atlassian Jira (v8.3.4#803005)