[ 
https://issues.apache.org/jira/browse/SPARK-22663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sajeev Ramakrishnan updated SPARK-22663:
----------------------------------------
    Description: 
Dear Team,
  As of now when we create a Dataset from a datasource, we give 
as[<case-class>] at the end to do the mapping. But, if the case class is having 
an extra attribute, then spark throws error.

Eg. 
case class MyClass(
                var line: String = "",
                var prevLine: String = ""
)

val raw= spark.read.textFile(<file>)
var a:Dataset[MyClass] = raw.withColumn("line", split(col("value"), 
"\\t")).select(
      col("line").getItem(0).as("line")
).as[MyClass]

This code fails telling that there is no match for the column "prevLine". 

Instead, if spark can do the mapping with the available columns will help the 
developers to build spark programs with Datasets where so many joins are 
involved and the result of that would add multiple columns every time. It will 
be difficult to have different case classes for different join results.

Thanks & Regards,
Sajeev Ramakrishnan

  was:
Dear Team,
  As of now when we create a Dataset from a datasource, we give 
as[<case-class>] at the end to do the mapping. But, if the case class is having 
an extra attribute, then spark throws error.

Eg. 
case class MyClass(
                var line: String = "",
                var prevLine: String = ""
)

val raw= spark.read.textFile(<file>)
var a:Dataset[MyClass] = raw.withColumn("line", split(col("value"), 
"\\t")).select(
      col("line").getItem(0).as("line")
).as[MyClass]

This code fails telling that there is no match for the column "prevLine"

Fixing this will help the developers to build spark programs with Datasets 
where so many joins are involved and the result of that would add multiple 
columns every time. It will be difficult to have different case classes for 
different join results.

Thanks & Regards,
Sajeev Ramakrishnan


> Spark DataSet to case class mapping mismatches
> ----------------------------------------------
>
>                 Key: SPARK-22663
>                 URL: https://issues.apache.org/jira/browse/SPARK-22663
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 2.2.0
>            Reporter: Sajeev Ramakrishnan
>            Priority: Minor
>              Labels: usability
>
> Dear Team,
>   As of now when we create a Dataset from a datasource, we give 
> as[<case-class>] at the end to do the mapping. But, if the case class is 
> having an extra attribute, then spark throws error.
> Eg. 
> case class MyClass(
>                 var line: String = "",
>                 var prevLine: String = ""
> )
> val raw= spark.read.textFile(<file>)
> var a:Dataset[MyClass] = raw.withColumn("line", split(col("value"), 
> "\\t")).select(
>       col("line").getItem(0).as("line")
> ).as[MyClass]
> This code fails telling that there is no match for the column "prevLine". 
> Instead, if spark can do the mapping with the available columns will help the 
> developers to build spark programs with Datasets where so many joins are 
> involved and the result of that would add multiple columns every time. It 
> will be difficult to have different case classes for different join results.
> Thanks & Regards,
> Sajeev Ramakrishnan



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to