[
https://issues.apache.org/jira/browse/SPARK-51343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ben Burnett updated SPARK-51343:
--------------------------------
Fix Version/s: (was: 3.5.4)
> RelationPlugin scala signature does not match bytecode
> ------------------------------------------------------
>
> Key: SPARK-51343
> URL: https://issues.apache.org/jira/browse/SPARK-51343
> Project: Spark
> Issue Type: Bug
> Components: Connect, Connect Contrib
> Affects Versions: 3.5.4
> Reporter: Ben Burnett
> Priority: Minor
>
> I'm writing a dataframe plugin for spark connect to support functionality
> that previously used py4j and it seems like the RelationPlugin class has
> mismatched scala and java signatures in the binary. It seems like
> `com.google.protobuf.Any` being shaded in the bytecode to
> `org.sparkproject.connect.protobuf.Any` but remains as
> `com.google.protobuf.Any` in the scala signature annotation.
> Here's the jd-gui bytecode reassembled
> {color:#7f0055}public{color}{color:#000000}
> {color}{color:#7f0055}interface{color}{color:#000000}
> {color}{color:#000000}RelationPlugin{color}{color:#000000}
> {color}{color:#000000}{{color}
> {color:#000000}
> {color}{color:#000000}Option{color}{color:#000000}<{color}{color:#000000}LogicalPlan{color}{color:#000000}>{color}{color:#000000}
>
> {color}{color:#000000}transform{color}{color:#000000}({color}{color:#000000}Any{color}{color:#000000}
> {color}{color:#000000}paramAny{color}{color:#000000},{color}{color:#000000}
> {color}{color:#000000}SparkConnectPlanner{color}{color:#000000}
> {color}{color:#000000}paramSparkConnectPlanner{color}{color:#000000}){color}{color:#000000};{color}
> {color:#000000}}
> Here's the bytecode
> public abstract
> transform(Lorg/sparkproject/connect/protobuf/Any;Lorg/apache/spark/sql/connect/planner/SparkConnectPlanner;)Lscala/Option;
> Here's the intellij reassembled class (Im guessing this is using the scala
> signature to inform reassembly but not sure)
> trait RelationPlugin {
> def transform(relation: com.google.protobuf.Any, planner:
> SparkConnectPlanner): Option[LogicalPlan]
> }
> I'm a bit new to scala signatures but when I run ScalaSigParser on it, I see
> lots of references to com.google.protobuf.Any
> 40:
> TypeRefType(ThisType(com.google.protobuf),com.google.protobuf.Any,List())
> 41: ThisType(com.google.protobuf)
> 42: com.google.protobuf
> Basically this is presenting a challenge because at compile time, it seems
> like my class is being validated against the scala signature (which uses
> com.google) at compile time but at runtime its using the bytecode (which uses
> org.sparkproject.connect) so the interface is actually changing. A potential
> solution is to shade protobuf to the org.sparkproject.connect like [another
> maintainer did
> here|[https://github.com/SemyonSinchenko/tsumugi-spark/blob/ac95948d3be24508aa236927ddc379fd36708d14/tsumugi-server/pom.xml#L247]]
> but that seems error prone and I don't want to include the protobuf jar in
> my final output.
> I understand not fixing this since it seems like the interface is being
> changed in spark 4 but Im not sure how to handle this at runtime. Is the
> solution just to shade it myself so that it passes compile checks but then
> reflects the correct runtime signature like Semyon did?
> Apologies if I'm creating a duplicate issue, I looked and couldn't find
> anything referencing this in the existing issues. This is my first issue so
> apologies if I've linked or set this up incorrectly{color}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]