[jira] [Updated] (SPARK-12932) Bad error message with trying to create Dataset from RDD of Java objects that are not bean-compliant

Reynold Xin (JIRA) Wed, 20 Jan 2016 12:58:49 -0800

     [ 
https://issues.apache.org/jira/browse/SPARK-12932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Reynold Xin updated SPARK-12932:
--------------------------------
    Description: 
When trying to create a Dataset from an RDD of Person (all using the Java API), 
I got the error "java.lang.UnsupportedOperationException: no encoder found for 
example_java.dataset.Person". This is not a very helpful error and no other 
logging information was apparent to help troubleshoot this.

It turned out that the problem was that my Person class did not have a default 
constructor and also did not have setter methods and that was the root cause.

This JIRA is for implementing a more usful error message to help Java 
developers who are trying out the Dataset API for the first time.

The full stack trace is:
{code}
Exception in thread "main" java.lang.UnsupportedOperationException: no encoder 
found for example_java.common.Person
        at 
org.apache.spark.sql.catalyst.JavaTypeInference$.org$apache$spark$sql$catalyst$JavaTypeInference$$extractorFor(JavaTypeInference.scala:403)
        at 
org.apache.spark.sql.catalyst.JavaTypeInference$.extractorsFor(JavaTypeInference.scala:314)
        at 
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.javaBean(ExpressionEncoder.scala:75)
        at org.apache.spark.sql.Encoders$.bean(Encoder.scala:176)
        at org.apache.spark.sql.Encoders.bean(Encoder.scala)
{code}

NOTE that if I do provide EITHER the default constructor OR the setters, but 
not both, then I get a stack trace with much more useful information, but 
omitting BOTH causes this issue.

The original source is below.

{code:title=Example.java}
public class JavaDatasetExample {

    public static void main(String[] args) throws Exception {

        SparkConf sparkConf = new SparkConf()
                .setAppName("Example")
                .setMaster("local[*]");

        JavaSparkContext sc = new JavaSparkContext(sparkConf);

        SQLContext sqlContext = new SQLContext(sc);

        List<Person> people = ImmutableList.of(
                new Person("Joe", "Bloggs", 21, "NY")
        );

        Dataset<Person> dataset = sqlContext.createDataset(people, 
Encoders.bean(Person.class));

{code}

{code:title=Person.java}
class Person implements Serializable {

    String first;
    String last;
    int age;
    String state;

    public Person() {
    }

    public Person(String first, String last, int age, String state) {
        this.first = first;
        this.last = last;
        this.age = age;
        this.state = state;
    }

    public String getFirst() {
        return first;
    }

    public String getLast() {
        return last;
    }

    public int getAge() {
        return age;
    }

    public String getState() {
        return state;
    }

}
{code}


  was:
When trying to create a Dataset from an RDD of Person (all using the Java API), 
I got the error "java.lang.UnsupportedOperationException: no encoder found for 
example_java.dataset.Person". This is not a very helpful error and no other 
logging information was apparent to help troubleshoot this.

It turned out that the problem was that my Person class did not have a default 
constructor and also did not have setter methods and that was the root cause.

This JIRA is for implementing a more usful error message to help Java 
developers who are trying out the Dataset API for the first time.

The full stack trace is:

{{Exception in thread "main" java.lang.UnsupportedOperationException: no 
encoder found for example_java.common.Person
        at 
org.apache.spark.sql.catalyst.JavaTypeInference$.org$apache$spark$sql$catalyst$JavaTypeInference$$extractorFor(JavaTypeInference.scala:403)
        at 
org.apache.spark.sql.catalyst.JavaTypeInference$.extractorsFor(JavaTypeInference.scala:314)
        at 
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.javaBean(ExpressionEncoder.scala:75)
        at org.apache.spark.sql.Encoders$.bean(Encoder.scala:176)
        at org.apache.spark.sql.Encoders.bean(Encoder.scala)
}}

NOTE that if I do provide EITHER the default constructor OR the setters, but 
not both, then I get a stack trace with much more useful information, but 
omitting BOTH causes this issue.

The original source is below.

{code:title=Example.java}
public class JavaDatasetExample {

    public static void main(String[] args) throws Exception {

        SparkConf sparkConf = new SparkConf()
                .setAppName("Example")
                .setMaster("local[*]");

        JavaSparkContext sc = new JavaSparkContext(sparkConf);

        SQLContext sqlContext = new SQLContext(sc);

        List<Person> people = ImmutableList.of(
                new Person("Joe", "Bloggs", 21, "NY")
        );

        Dataset<Person> dataset = sqlContext.createDataset(people, 
Encoders.bean(Person.class));

{code}

{code:title=Person.java}
class Person implements Serializable {

    String first;
    String last;
    int age;
    String state;

    public Person() {
    }

    public Person(String first, String last, int age, String state) {
        this.first = first;
        this.last = last;
        this.age = age;
        this.state = state;
    }

    public String getFirst() {
        return first;
    }

    public String getLast() {
        return last;
    }

    public int getAge() {
        return age;
    }

    public String getState() {
        return state;
    }

}
{code}



> Bad error message with trying to create Dataset from RDD of Java objects that 
> are not bean-compliant
> ----------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-12932
>                 URL: https://issues.apache.org/jira/browse/SPARK-12932
>             Project: Spark
>          Issue Type: Bug
>          Components: Java API
>    Affects Versions: 1.6.0
>         Environment: Ubuntu 15.10 / Java 8
>            Reporter: Andy Grove
>
> When trying to create a Dataset from an RDD of Person (all using the Java 
> API), I got the error "java.lang.UnsupportedOperationException: no encoder 
> found for example_java.dataset.Person". This is not a very helpful error and 
> no other logging information was apparent to help troubleshoot this.
> It turned out that the problem was that my Person class did not have a 
> default constructor and also did not have setter methods and that was the 
> root cause.
> This JIRA is for implementing a more usful error message to help Java 
> developers who are trying out the Dataset API for the first time.
> The full stack trace is:
> {code}
> Exception in thread "main" java.lang.UnsupportedOperationException: no 
> encoder found for example_java.common.Person
>       at 
> org.apache.spark.sql.catalyst.JavaTypeInference$.org$apache$spark$sql$catalyst$JavaTypeInference$$extractorFor(JavaTypeInference.scala:403)
>       at 
> org.apache.spark.sql.catalyst.JavaTypeInference$.extractorsFor(JavaTypeInference.scala:314)
>       at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.javaBean(ExpressionEncoder.scala:75)
>       at org.apache.spark.sql.Encoders$.bean(Encoder.scala:176)
>       at org.apache.spark.sql.Encoders.bean(Encoder.scala)
> {code}
> NOTE that if I do provide EITHER the default constructor OR the setters, but 
> not both, then I get a stack trace with much more useful information, but 
> omitting BOTH causes this issue.
> The original source is below.
> {code:title=Example.java}
> public class JavaDatasetExample {
>     public static void main(String[] args) throws Exception {
>         SparkConf sparkConf = new SparkConf()
>                 .setAppName("Example")
>                 .setMaster("local[*]");
>         JavaSparkContext sc = new JavaSparkContext(sparkConf);
>         SQLContext sqlContext = new SQLContext(sc);
>         List<Person> people = ImmutableList.of(
>                 new Person("Joe", "Bloggs", 21, "NY")
>         );
>         Dataset<Person> dataset = sqlContext.createDataset(people, 
> Encoders.bean(Person.class));
> {code}
> {code:title=Person.java}
> class Person implements Serializable {
>     String first;
>     String last;
>     int age;
>     String state;
>     public Person() {
>     }
>     public Person(String first, String last, int age, String state) {
>         this.first = first;
>         this.last = last;
>         this.age = age;
>         this.state = state;
>     }
>     public String getFirst() {
>         return first;
>     }
>     public String getLast() {
>         return last;
>     }
>     public int getAge() {
>         return age;
>     }
>     public String getState() {
>         return state;
>     }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-12932) Bad error message with trying to create Dataset from RDD of Java objects that are not bean-compliant

Reply via email to