[ 
https://issues.apache.org/jira/browse/SPARK-15732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian resolved SPARK-15732.
--------------------------------
       Resolution: Fixed
    Fix Version/s: 2.0.0

Resolved by https://github.com/apache/spark/pull/13485

> Dataset generated code "generated.java" Fails with Certain Case Classes
> -----------------------------------------------------------------------
>
>                 Key: SPARK-15732
>                 URL: https://issues.apache.org/jira/browse/SPARK-15732
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.0
>         Environment: Version 2.0 Preview on the Databricks Community Edition
>            Reporter: Sanjay Dasgupta
>            Assignee: Wenchen Fan
>            Priority: Critical
>             Fix For: 2.0.0
>
>
> The Dataset code generation logic fails to handle field-names in case classes 
> that are also Java keywords (e.g. "abstract"). Scala has an escaping 
> mechanism (using backquotes) that allows Java (and Scala) keywords to be used 
> as names in programs, as in the example below:
> case class PatApp(number: Int, title: String, `abstract`: String)
> But this case class trips up the Dataset code generator. The following error 
> message is displayed when Datasets containing instances of such case classes 
> are processed.
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in 
> stage 54.0 failed 1 times, most recent failure: Lost task 2.0 in stage 54.0 
> (TID 1304, localhost): java.lang.RuntimeException: Error while encoding: 
> java.util.concurrent.ExecutionException: java.lang.Exception: failed to 
> compile: org.codehaus.commons.compiler.CompileException: File 
> 'generated.java', Line 60, Column 84: Unexpected selector 'abstract' after "."
> The following code can be used to replicate the problem. This code was run on 
> the Databricks CE, in a Scala notebook, in 3 separate cells as shown below:
> // CELL 1:
> //
> // Create a Case Class with "abstract" as a field-name ...
> //
> package keywordissue
> // The field-name abstract is a Java keyword ...
> case class PatApp(number: Int, title: String, `abstract`: String)
> // CELL 2:
> //
> // Create a Dataset using the case class ...
> //
> import keywordissue.PatApp
> val applications = List(PatApp(1001, "1001", "Abstract 1001"), PatApp(1002, 
> "1002", "Abstract 1002"), PatApp(1003, "1003", "Abstract for 1003"), 
> PatApp(/* Duplicate! */ 1003, "1004", "Abstract 1004"))
> val appsDataset = sc.parallelize(applications).toDF.as[PatApp]
> // CELL 3:
> //
> // Force Dataset code-generation. This causes the error message to display ...
> //
> val duplicates = appsDataset.groupByKey(_.number).mapGroups((k, i) => (k, 
> i.length)).filter(_._2 > 0)
> duplicates.collect().foreach(println)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to