[ 
https://issues.apache.org/jira/browse/SPARK-49961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Twiner updated SPARK-49961:
---------------------------------
    Description: 
In versions prior to 4.0.0-preview2 sql.Dataset transform had signature:
{code:java}
def transform[U](t: (sql.Dataset[T]) ⇒ sql.Dataset[U]): sql.Dataset[U] {code}
4.0.0-preview2 has moved this to the parent class sql.api.Dataset with the 
signature:
{code:java}
def transform[U](t: (sql.api.Dataset[T]) ⇒ sql.api.Dataset[U]): 
sql.api.Dataset[U] {code}
rendering all function objects and return values with incompatible types.

It seems F Bounded or some similar self type is needed to have the types remain 
correct (e.g. if you are dealing with sql.Dataset all types should be 
sql.Dataset),
{code:java}
import sparkSession.implicits._
val ds = Seq(1, 2).toDS()
val f: Dataset[Int] => Dataset[Int] = d => d.selectExpr("(value + 1) 
value").as[Int]
val transformed = ds.transform(f)
assert(transformed.collect().sorted === Array(2, 3)) {code}
now fails to compile.

  was:
In versions prior to 4.0.0-preview2 sql.Dataset transform had signature:
{code:java}
def transform[U](t: (sql.Dataset[T]) ⇒ sql.Dataset[U]): sql.Dataset[U] {code}
4.0.0-preview2 has moved this to the parent class sql.api.Dataset with the 
signature:
{code:java}
def transform[U](t: (sql.api.Dataset[T]) ⇒ sql.api.Dataset[U]): 
sql.api.Dataset[U] {code}
rendering all function objects and return values with incompatible types.

It seems F Bounded or some similar self type is needed to have the types remain 
correct (e.g. if you are dealing with sql.Dataset all types should be 
sql.Dataset),

 


> Dataset.transform no longer has the correct return type
> -------------------------------------------------------
>
>                 Key: SPARK-49961
>                 URL: https://issues.apache.org/jira/browse/SPARK-49961
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 4.0.0
>            Reporter: Chris Twiner
>            Priority: Major
>
> In versions prior to 4.0.0-preview2 sql.Dataset transform had signature:
> {code:java}
> def transform[U](t: (sql.Dataset[T]) ⇒ sql.Dataset[U]): sql.Dataset[U] {code}
> 4.0.0-preview2 has moved this to the parent class sql.api.Dataset with the 
> signature:
> {code:java}
> def transform[U](t: (sql.api.Dataset[T]) ⇒ sql.api.Dataset[U]): 
> sql.api.Dataset[U] {code}
> rendering all function objects and return values with incompatible types.
> It seems F Bounded or some similar self type is needed to have the types 
> remain correct (e.g. if you are dealing with sql.Dataset all types should be 
> sql.Dataset),
> {code:java}
> import sparkSession.implicits._
> val ds = Seq(1, 2).toDS()
> val f: Dataset[Int] => Dataset[Int] = d => d.selectExpr("(value + 1) 
> value").as[Int]
> val transformed = ds.transform(f)
> assert(transformed.collect().sorted === Array(2, 3)) {code}
> now fails to compile.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to