Fangchen Li created SPARK-57549:
-----------------------------------

             Summary: Scala 3 compile-time AgnosticEncoder derivation (replace 
ScalaReflection.encoderFor)
                 Key: SPARK-57549
                 URL: https://issues.apache.org/jira/browse/SPARK-57549
             Project: Spark
          Issue Type: Sub-task
          Components: Spark Core, SQL
    Affects Versions: 4.3.0
            Reporter: Fangchen Li


ScalaReflection.encoderFor[T: TypeTag] derives Dataset[T] encoders via Scala 2 
runtime reflection (TypeTag + scala.reflect.runtime.universe), which has no 
Scala 3 equivalent. It is the structural blocker for a Scala 3 build of Spark 
SQL — the one piece that needs a real reimplementation rather than a mechanical 
port.
 
The seam is already there: AgnosticEncoder is Spark's reflection-free encoder 
description, and ExpressionEncoder.apply(enc: AgnosticEncoder[T]) accepts one 
with no TypeTag. So only the derivation needs to be replaced; everything 
downstream (ExpressionEncoder, ser/deser codegen, Spark Connect) is reused 
unchanged.
 
Proposed: # Add a Scala 3 Mirror/inline derivation, deriveAgnosticEncoder[T], 
emitting AgnosticEncoder directly — one self-contained file in sql-api, 
depending only on AgnosticEncoder + the Scala stdlib (no new IR).
 # On the Scala 3 build, ExpressionEncoder.apply[T]() calls it instead of 
encoderFor; the reflective body stays for 2.13.
 # Drop the ~16 TypeTag context bounds in encoder-producing signatures 
(Encoders, SparkSession.implicits, Dataset/functions/Aggregator). Forced by 
Scala 3, regardless, TypeTag does not exist there.

 
Reference (working prototype): # The drop-in file: 
[https://github.com/bearing-research/ProtoCatalyst/blob/main/encoder-spark/src/main/scala/protocatalyst/encoder/spark/AgnosticDerivation.scala|http://example.com]
 (one file, no protocatalyst dependencies — rename the package and it is 
encoderFor's Scala 3 replacement; compiled under 
org.apache.spark.sql.catalyst.encoders on every build).
 # Validated against Spark's own reflective encoderFor goldens (structural 
parity) and round-tripped through Spark's unmodified ser/deser codegen. Docs: 
docs/scala3-encoder/REPORT.md, docs/scala3-encoder/MIGRATION.md.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to