[ 
https://issues.apache.org/jira/browse/SPARK-17368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15459649#comment-15459649
 ] 

Aris Vlasakakis commented on SPARK-17368:
-----------------------------------------

I actually had an identical first thought from my experience of value classes 
and how they disappear in the JVM byte code. It would be very helpful if in the 
documentation somewhere that it was said that Scala value classes were 
explicitly not supported by spark datasets. The error messages are extremely 
cryptic and very confusing. Even better would be some kind of macro support or 
whatever else by Spark that would find and call it out of your code, but that's 
wishful thinking.

> Scala value classes create encoder problems and break at runtime
> ----------------------------------------------------------------
>
>                 Key: SPARK-17368
>                 URL: https://issues.apache.org/jira/browse/SPARK-17368
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core, SQL
>    Affects Versions: 1.6.2, 2.0.0
>         Environment: JDK 8 on MacOS
> Scala 2.11.8
> Spark 2.0.0
>            Reporter: Aris Vlasakakis
>
> Using Scala value classes as the inner type for Datasets breaks in Spark 2.0 
> and 1.6.X.
> This simple Spark 2 application demonstrates that the code will compile, but 
> will break at runtime with the error. The value class is of course 
> *FeatureId*, as it extends AnyVal.
> {noformat}
> Exception in thread "main" java.lang.RuntimeException: Error while encoding: 
> java.lang.RuntimeException: Couldn't find v on int
> assertnotnull(input[0, int, true], top level non-flat input object).v AS v#0
> +- assertnotnull(input[0, int, true], top level non-flat input object).v
>    +- assertnotnull(input[0, int, true], top level non-flat input object)
>       +- input[0, int, true]".
>         at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:279)
>         at 
> org.apache.spark.sql.SparkSession$$anonfun$3.apply(SparkSession.scala:421)
>         at 
> org.apache.spark.sql.SparkSession$$anonfun$3.apply(SparkSession.scala:421)
> {noformat}
> Test code for Spark 2.0.0:
> {noformat}
> import org.apache.spark.sql.{Dataset, SparkSession}
> object BreakSpark {
>   case class FeatureId(v: Int) extends AnyVal
>   def main(args: Array[String]): Unit = {
>     val seq = Seq(FeatureId(1), FeatureId(2), FeatureId(3))
>     val spark = SparkSession.builder.getOrCreate()
>     import spark.implicits._
>     spark.sparkContext.setLogLevel("warn")
>     val ds: Dataset[FeatureId] = spark.createDataset(seq)
>     println(s"BREAK HERE: ${ds.count}")
>   }
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to