[
https://issues.apache.org/jira/browse/SPARK-57549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Fangchen Li updated SPARK-57549:
--------------------------------
Description:
ScalaReflection.encoderFor[T: TypeTag] derives Dataset[T] encoders via Scala 2
runtime reflection (TypeTag + scala.reflect.runtime.universe), which has no
Scala 3 equivalent. It is the structural blocker for a Scala 3 build of Spark
SQL — the one piece that needs a real reimplementation rather than a mechanical
port.
The seam is already there: AgnosticEncoder is Spark's reflection-free encoder
description, and ExpressionEncoder.apply(enc: AgnosticEncoder[T]) accepts one
with no TypeTag. So only the derivation needs to be replaced; everything
downstream (ExpressionEncoder, ser/deser codegen, Spark Connect) is reused
unchanged.
Proposed: # Add a Scala 3 Mirror/inline derivation, deriveAgnosticEncoder[T],
emitting AgnosticEncoder directly — one self-contained file in sql-api,
depending only on AgnosticEncoder + the Scala stdlib (no new IR).
# On the Scala 3 build, ExpressionEncoder.apply[T]() calls it instead of
encoderFor; the reflective body stays for 2.13.
# Drop the ~16 TypeTag context bounds in encoder-producing signatures
(Encoders, SparkSession.implicits, Dataset/functions/Aggregator). Forced by
Scala 3, regardless, TypeTag does not exist there.
Reference (working prototype):
# The drop-in file:
https://github.com/bearing-research/ProtoCatalyst/blob/main/encoder-spark/src/main/scala/protocatalyst/encoder/spark/AgnosticDerivation.scala
(one file, no protocatalyst dependencies — rename the package and it is
encoderFor's Scala 3 replacement; compiled under
org.apache.spark.sql.catalyst.encoders on every build).
# Validated against Spark's own reflective encoderFor goldens (structural
parity) and round-tripped through Spark's unmodified ser/deser codegen. Docs:
docs/scala3-encoder/REPORT.md, docs/scala3-encoder/MIGRATION.md.
was:
ScalaReflection.encoderFor[T: TypeTag] derives Dataset[T] encoders via Scala 2
runtime reflection (TypeTag + scala.reflect.runtime.universe), which has no
Scala 3 equivalent. It is the structural blocker for a Scala 3 build of Spark
SQL — the one piece that needs a real reimplementation rather than a mechanical
port.
The seam is already there: AgnosticEncoder is Spark's reflection-free encoder
description, and ExpressionEncoder.apply(enc: AgnosticEncoder[T]) accepts one
with no TypeTag. So only the derivation needs to be replaced; everything
downstream (ExpressionEncoder, ser/deser codegen, Spark Connect) is reused
unchanged.
Proposed: # Add a Scala 3 Mirror/inline derivation, deriveAgnosticEncoder[T],
emitting AgnosticEncoder directly — one self-contained file in sql-api,
depending only on AgnosticEncoder + the Scala stdlib (no new IR).
# On the Scala 3 build, ExpressionEncoder.apply[T]() calls it instead of
encoderFor; the reflective body stays for 2.13.
# Drop the ~16 TypeTag context bounds in encoder-producing signatures
(Encoders, SparkSession.implicits, Dataset/functions/Aggregator). Forced by
Scala 3, regardless, TypeTag does not exist there.
Reference (working prototype): # The drop-in file:
https://github.com/bearing-research/ProtoCatalyst/blob/main/encoder-spark/src/main/scala/protocatalyst/encoder/spark/AgnosticDerivation.scala
(one file, no protocatalyst dependencies — rename the package and it is
encoderFor's Scala 3 replacement; compiled under
org.apache.spark.sql.catalyst.encoders on every build).
# Validated against Spark's own reflective encoderFor goldens (structural
parity) and round-tripped through Spark's unmodified ser/deser codegen. Docs:
docs/scala3-encoder/REPORT.md, docs/scala3-encoder/MIGRATION.md.
> Scala 3 compile-time AgnosticEncoder derivation (replace
> ScalaReflection.encoderFor)
> ------------------------------------------------------------------------------------
>
> Key: SPARK-57549
> URL: https://issues.apache.org/jira/browse/SPARK-57549
> Project: Spark
> Issue Type: Sub-task
> Components: Spark Core, SQL
> Affects Versions: 4.3.0
> Reporter: Fangchen Li
> Priority: Major
>
> ScalaReflection.encoderFor[T: TypeTag] derives Dataset[T] encoders via Scala
> 2 runtime reflection (TypeTag + scala.reflect.runtime.universe), which has no
> Scala 3 equivalent. It is the structural blocker for a Scala 3 build of Spark
> SQL — the one piece that needs a real reimplementation rather than a
> mechanical port.
>
> The seam is already there: AgnosticEncoder is Spark's reflection-free encoder
> description, and ExpressionEncoder.apply(enc: AgnosticEncoder[T]) accepts one
> with no TypeTag. So only the derivation needs to be replaced; everything
> downstream (ExpressionEncoder, ser/deser codegen, Spark Connect) is reused
> unchanged.
>
> Proposed: # Add a Scala 3 Mirror/inline derivation, deriveAgnosticEncoder[T],
> emitting AgnosticEncoder directly — one self-contained file in sql-api,
> depending only on AgnosticEncoder + the Scala stdlib (no new IR).
> # On the Scala 3 build, ExpressionEncoder.apply[T]() calls it instead of
> encoderFor; the reflective body stays for 2.13.
> # Drop the ~16 TypeTag context bounds in encoder-producing signatures
> (Encoders, SparkSession.implicits, Dataset/functions/Aggregator). Forced by
> Scala 3, regardless, TypeTag does not exist there.
>
> Reference (working prototype):
> # The drop-in file:
> https://github.com/bearing-research/ProtoCatalyst/blob/main/encoder-spark/src/main/scala/protocatalyst/encoder/spark/AgnosticDerivation.scala
> (one file, no protocatalyst dependencies — rename the package and it is
> encoderFor's Scala 3 replacement; compiled under
> org.apache.spark.sql.catalyst.encoders on every build).
> # Validated against Spark's own reflective encoderFor goldens (structural
> parity) and round-tripped through Spark's unmodified ser/deser codegen. Docs:
> docs/scala3-encoder/REPORT.md, docs/scala3-encoder/MIGRATION.md.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]