[ 
https://issues.apache.org/jira/browse/SPARK-57549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fangchen Li updated SPARK-57549:
--------------------------------
    Description: 
ScalaReflection.encoderFor[T: TypeTag] derives Dataset[T] encoders via Scala 2 
runtime reflection (TypeTag + scala.reflect.runtime.universe), which has no 
Scala 3 equivalent. It is the structural blocker for a Scala 3 build of Spark 
SQL — the one piece that needs a real reimplementation rather than a mechanical 
port.
 
The seam is already there: AgnosticEncoder is Spark's reflection-free encoder 
description, and ExpressionEncoder.apply(enc: AgnosticEncoder[T]) accepts one 
with no TypeTag. So only the derivation needs to be replaced; everything 
downstream (ExpressionEncoder, ser/deser codegen, Spark Connect) is reused 
unchanged.
 
Proposed: # Add a Scala 3 Mirror/inline derivation, deriveAgnosticEncoder[T], 
emitting AgnosticEncoder directly — one self-contained file in sql-api, 
depending only on AgnosticEncoder + the Scala stdlib (no new IR).
 # On the Scala 3 build, ExpressionEncoder.apply[T]() calls it instead of 
encoderFor; the reflective body stays for 2.13.
 # Drop the ~16 TypeTag context bounds in encoder-producing signatures 
(Encoders, SparkSession.implicits, Dataset/functions/Aggregator). Forced by 
Scala 3, regardless, TypeTag does not exist there.

 
Reference (working prototype): 

# The drop-in file: 
https://github.com/bearing-research/ProtoCatalyst/blob/main/encoder-spark/src/main/scala/protocatalyst/encoder/spark/AgnosticDerivation.scala
 (one file, no protocatalyst dependencies — rename the package and it is 
encoderFor's Scala 3 replacement; compiled under 
org.apache.spark.sql.catalyst.encoders on every build).
 # Validated against Spark's own reflective encoderFor goldens (structural 
parity) and round-tripped through Spark's unmodified ser/deser codegen. Docs: 
docs/scala3-encoder/REPORT.md, docs/scala3-encoder/MIGRATION.md.

  was:
ScalaReflection.encoderFor[T: TypeTag] derives Dataset[T] encoders via Scala 2 
runtime reflection (TypeTag + scala.reflect.runtime.universe), which has no 
Scala 3 equivalent. It is the structural blocker for a Scala 3 build of Spark 
SQL — the one piece that needs a real reimplementation rather than a mechanical 
port.
 
The seam is already there: AgnosticEncoder is Spark's reflection-free encoder 
description, and ExpressionEncoder.apply(enc: AgnosticEncoder[T]) accepts one 
with no TypeTag. So only the derivation needs to be replaced; everything 
downstream (ExpressionEncoder, ser/deser codegen, Spark Connect) is reused 
unchanged.
 
Proposed: # Add a Scala 3 Mirror/inline derivation, deriveAgnosticEncoder[T], 
emitting AgnosticEncoder directly — one self-contained file in sql-api, 
depending only on AgnosticEncoder + the Scala stdlib (no new IR).
 # On the Scala 3 build, ExpressionEncoder.apply[T]() calls it instead of 
encoderFor; the reflective body stays for 2.13.
 # Drop the ~16 TypeTag context bounds in encoder-producing signatures 
(Encoders, SparkSession.implicits, Dataset/functions/Aggregator). Forced by 
Scala 3, regardless, TypeTag does not exist there.

 
Reference (working prototype): # The drop-in file: 
https://github.com/bearing-research/ProtoCatalyst/blob/main/encoder-spark/src/main/scala/protocatalyst/encoder/spark/AgnosticDerivation.scala
 (one file, no protocatalyst dependencies — rename the package and it is 
encoderFor's Scala 3 replacement; compiled under 
org.apache.spark.sql.catalyst.encoders on every build).
 # Validated against Spark's own reflective encoderFor goldens (structural 
parity) and round-tripped through Spark's unmodified ser/deser codegen. Docs: 
docs/scala3-encoder/REPORT.md, docs/scala3-encoder/MIGRATION.md.


> Scala 3 compile-time AgnosticEncoder derivation (replace 
> ScalaReflection.encoderFor)
> ------------------------------------------------------------------------------------
>
>                 Key: SPARK-57549
>                 URL: https://issues.apache.org/jira/browse/SPARK-57549
>             Project: Spark
>          Issue Type: Sub-task
>          Components: Spark Core, SQL
>    Affects Versions: 4.3.0
>            Reporter: Fangchen Li
>            Priority: Major
>
> ScalaReflection.encoderFor[T: TypeTag] derives Dataset[T] encoders via Scala 
> 2 runtime reflection (TypeTag + scala.reflect.runtime.universe), which has no 
> Scala 3 equivalent. It is the structural blocker for a Scala 3 build of Spark 
> SQL — the one piece that needs a real reimplementation rather than a 
> mechanical port.
>  
> The seam is already there: AgnosticEncoder is Spark's reflection-free encoder 
> description, and ExpressionEncoder.apply(enc: AgnosticEncoder[T]) accepts one 
> with no TypeTag. So only the derivation needs to be replaced; everything 
> downstream (ExpressionEncoder, ser/deser codegen, Spark Connect) is reused 
> unchanged.
>  
> Proposed: # Add a Scala 3 Mirror/inline derivation, deriveAgnosticEncoder[T], 
> emitting AgnosticEncoder directly — one self-contained file in sql-api, 
> depending only on AgnosticEncoder + the Scala stdlib (no new IR).
>  # On the Scala 3 build, ExpressionEncoder.apply[T]() calls it instead of 
> encoderFor; the reflective body stays for 2.13.
>  # Drop the ~16 TypeTag context bounds in encoder-producing signatures 
> (Encoders, SparkSession.implicits, Dataset/functions/Aggregator). Forced by 
> Scala 3, regardless, TypeTag does not exist there.
>  
> Reference (working prototype): 
> # The drop-in file: 
> https://github.com/bearing-research/ProtoCatalyst/blob/main/encoder-spark/src/main/scala/protocatalyst/encoder/spark/AgnosticDerivation.scala
>  (one file, no protocatalyst dependencies — rename the package and it is 
> encoderFor's Scala 3 replacement; compiled under 
> org.apache.spark.sql.catalyst.encoders on every build).
>  # Validated against Spark's own reflective encoderFor goldens (structural 
> parity) and round-tripped through Spark's unmodified ser/deser codegen. Docs: 
> docs/scala3-encoder/REPORT.md, docs/scala3-encoder/MIGRATION.md.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to