[
https://issues.apache.org/jira/browse/SPARK-15732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Cheng Lian resolved SPARK-15732.
--------------------------------
Resolution: Fixed
Fix Version/s: 2.0.0
Resolved by https://github.com/apache/spark/pull/13485
> Dataset generated code "generated.java" Fails with Certain Case Classes
> -----------------------------------------------------------------------
>
> Key: SPARK-15732
> URL: https://issues.apache.org/jira/browse/SPARK-15732
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.0.0
> Environment: Version 2.0 Preview on the Databricks Community Edition
> Reporter: Sanjay Dasgupta
> Assignee: Wenchen Fan
> Priority: Critical
> Fix For: 2.0.0
>
>
> The Dataset code generation logic fails to handle field-names in case classes
> that are also Java keywords (e.g. "abstract"). Scala has an escaping
> mechanism (using backquotes) that allows Java (and Scala) keywords to be used
> as names in programs, as in the example below:
> case class PatApp(number: Int, title: String, `abstract`: String)
> But this case class trips up the Dataset code generator. The following error
> message is displayed when Datasets containing instances of such case classes
> are processed.
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in
> stage 54.0 failed 1 times, most recent failure: Lost task 2.0 in stage 54.0
> (TID 1304, localhost): java.lang.RuntimeException: Error while encoding:
> java.util.concurrent.ExecutionException: java.lang.Exception: failed to
> compile: org.codehaus.commons.compiler.CompileException: File
> 'generated.java', Line 60, Column 84: Unexpected selector 'abstract' after "."
> The following code can be used to replicate the problem. This code was run on
> the Databricks CE, in a Scala notebook, in 3 separate cells as shown below:
> // CELL 1:
> //
> // Create a Case Class with "abstract" as a field-name ...
> //
> package keywordissue
> // The field-name abstract is a Java keyword ...
> case class PatApp(number: Int, title: String, `abstract`: String)
> // CELL 2:
> //
> // Create a Dataset using the case class ...
> //
> import keywordissue.PatApp
> val applications = List(PatApp(1001, "1001", "Abstract 1001"), PatApp(1002,
> "1002", "Abstract 1002"), PatApp(1003, "1003", "Abstract for 1003"),
> PatApp(/* Duplicate! */ 1003, "1004", "Abstract 1004"))
> val appsDataset = sc.parallelize(applications).toDF.as[PatApp]
> // CELL 3:
> //
> // Force Dataset code-generation. This causes the error message to display ...
> //
> val duplicates = appsDataset.groupByKey(_.number).mapGroups((k, i) => (k,
> i.length)).filter(_._2 > 0)
> duplicates.collect().foreach(println)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]