[
https://issues.apache.org/jira/browse/SPARK-49961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated SPARK-49961:
-----------------------------------
Labels: pull-request-available (was: )
> Dataset.transform no longer has the correct return type
> -------------------------------------------------------
>
> Key: SPARK-49961
> URL: https://issues.apache.org/jira/browse/SPARK-49961
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 4.0.0
> Reporter: Chris Twiner
> Priority: Major
> Labels: pull-request-available
>
> In versions prior to 4.0.0-preview2 sql.Dataset transform had signature:
> {code:java}
> def transform[U](t: (sql.Dataset[T]) ⇒ sql.Dataset[U]): sql.Dataset[U] {code}
> 4.0.0-preview2 has moved this to the parent class sql.api.Dataset with the
> signature:
> {code:java}
> def transform[U](t: (sql.api.Dataset[T]) ⇒ sql.api.Dataset[U]):
> sql.api.Dataset[U] {code}
> rendering all function objects and return values with incompatible types.
> It seems F Bounded or some similar self type is needed to have the types
> remain correct (e.g. if you are dealing with sql.Dataset all types should be
> sql.Dataset),
> {code:java}
> import sparkSession.implicits._
> val ds = Seq(1, 2).toDS()
> val f: Dataset[Int] => Dataset[Int] = d => d.selectExpr("(value + 1)
> value").as[Int]
> val transformed = ds.transform(f)
> assert(transformed.collect().sorted === Array(2, 3)) {code}
> now fails to compile.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]