[
https://issues.apache.org/jira/browse/SPARK-12932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Reynold Xin updated SPARK-12932:
--------------------------------
Description:
When trying to create a Dataset from an RDD of Person (all using the Java API),
I got the error "java.lang.UnsupportedOperationException: no encoder found for
example_java.dataset.Person". This is not a very helpful error and no other
logging information was apparent to help troubleshoot this.
It turned out that the problem was that my Person class did not have a default
constructor and also did not have setter methods and that was the root cause.
This JIRA is for implementing a more usful error message to help Java
developers who are trying out the Dataset API for the first time.
The full stack trace is:
{code}
Exception in thread "main" java.lang.UnsupportedOperationException: no encoder
found for example_java.common.Person
at
org.apache.spark.sql.catalyst.JavaTypeInference$.org$apache$spark$sql$catalyst$JavaTypeInference$$extractorFor(JavaTypeInference.scala:403)
at
org.apache.spark.sql.catalyst.JavaTypeInference$.extractorsFor(JavaTypeInference.scala:314)
at
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.javaBean(ExpressionEncoder.scala:75)
at org.apache.spark.sql.Encoders$.bean(Encoder.scala:176)
at org.apache.spark.sql.Encoders.bean(Encoder.scala)
{code}
NOTE that if I do provide EITHER the default constructor OR the setters, but
not both, then I get a stack trace with much more useful information, but
omitting BOTH causes this issue.
The original source is below.
{code:title=Example.java}
public class JavaDatasetExample {
public static void main(String[] args) throws Exception {
SparkConf sparkConf = new SparkConf()
.setAppName("Example")
.setMaster("local[*]");
JavaSparkContext sc = new JavaSparkContext(sparkConf);
SQLContext sqlContext = new SQLContext(sc);
List<Person> people = ImmutableList.of(
new Person("Joe", "Bloggs", 21, "NY")
);
Dataset<Person> dataset = sqlContext.createDataset(people,
Encoders.bean(Person.class));
{code}
{code:title=Person.java}
class Person implements Serializable {
String first;
String last;
int age;
String state;
public Person() {
}
public Person(String first, String last, int age, String state) {
this.first = first;
this.last = last;
this.age = age;
this.state = state;
}
public String getFirst() {
return first;
}
public String getLast() {
return last;
}
public int getAge() {
return age;
}
public String getState() {
return state;
}
}
{code}
was:
When trying to create a Dataset from an RDD of Person (all using the Java API),
I got the error "java.lang.UnsupportedOperationException: no encoder found for
example_java.dataset.Person". This is not a very helpful error and no other
logging information was apparent to help troubleshoot this.
It turned out that the problem was that my Person class did not have a default
constructor and also did not have setter methods and that was the root cause.
This JIRA is for implementing a more usful error message to help Java
developers who are trying out the Dataset API for the first time.
The full stack trace is:
{{Exception in thread "main" java.lang.UnsupportedOperationException: no
encoder found for example_java.common.Person
at
org.apache.spark.sql.catalyst.JavaTypeInference$.org$apache$spark$sql$catalyst$JavaTypeInference$$extractorFor(JavaTypeInference.scala:403)
at
org.apache.spark.sql.catalyst.JavaTypeInference$.extractorsFor(JavaTypeInference.scala:314)
at
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.javaBean(ExpressionEncoder.scala:75)
at org.apache.spark.sql.Encoders$.bean(Encoder.scala:176)
at org.apache.spark.sql.Encoders.bean(Encoder.scala)
}}
NOTE that if I do provide EITHER the default constructor OR the setters, but
not both, then I get a stack trace with much more useful information, but
omitting BOTH causes this issue.
The original source is below.
{code:title=Example.java}
public class JavaDatasetExample {
public static void main(String[] args) throws Exception {
SparkConf sparkConf = new SparkConf()
.setAppName("Example")
.setMaster("local[*]");
JavaSparkContext sc = new JavaSparkContext(sparkConf);
SQLContext sqlContext = new SQLContext(sc);
List<Person> people = ImmutableList.of(
new Person("Joe", "Bloggs", 21, "NY")
);
Dataset<Person> dataset = sqlContext.createDataset(people,
Encoders.bean(Person.class));
{code}
{code:title=Person.java}
class Person implements Serializable {
String first;
String last;
int age;
String state;
public Person() {
}
public Person(String first, String last, int age, String state) {
this.first = first;
this.last = last;
this.age = age;
this.state = state;
}
public String getFirst() {
return first;
}
public String getLast() {
return last;
}
public int getAge() {
return age;
}
public String getState() {
return state;
}
}
{code}
> Bad error message with trying to create Dataset from RDD of Java objects that
> are not bean-compliant
> ----------------------------------------------------------------------------------------------------
>
> Key: SPARK-12932
> URL: https://issues.apache.org/jira/browse/SPARK-12932
> Project: Spark
> Issue Type: Bug
> Components: Java API
> Affects Versions: 1.6.0
> Environment: Ubuntu 15.10 / Java 8
> Reporter: Andy Grove
>
> When trying to create a Dataset from an RDD of Person (all using the Java
> API), I got the error "java.lang.UnsupportedOperationException: no encoder
> found for example_java.dataset.Person". This is not a very helpful error and
> no other logging information was apparent to help troubleshoot this.
> It turned out that the problem was that my Person class did not have a
> default constructor and also did not have setter methods and that was the
> root cause.
> This JIRA is for implementing a more usful error message to help Java
> developers who are trying out the Dataset API for the first time.
> The full stack trace is:
> {code}
> Exception in thread "main" java.lang.UnsupportedOperationException: no
> encoder found for example_java.common.Person
> at
> org.apache.spark.sql.catalyst.JavaTypeInference$.org$apache$spark$sql$catalyst$JavaTypeInference$$extractorFor(JavaTypeInference.scala:403)
> at
> org.apache.spark.sql.catalyst.JavaTypeInference$.extractorsFor(JavaTypeInference.scala:314)
> at
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.javaBean(ExpressionEncoder.scala:75)
> at org.apache.spark.sql.Encoders$.bean(Encoder.scala:176)
> at org.apache.spark.sql.Encoders.bean(Encoder.scala)
> {code}
> NOTE that if I do provide EITHER the default constructor OR the setters, but
> not both, then I get a stack trace with much more useful information, but
> omitting BOTH causes this issue.
> The original source is below.
> {code:title=Example.java}
> public class JavaDatasetExample {
> public static void main(String[] args) throws Exception {
> SparkConf sparkConf = new SparkConf()
> .setAppName("Example")
> .setMaster("local[*]");
> JavaSparkContext sc = new JavaSparkContext(sparkConf);
> SQLContext sqlContext = new SQLContext(sc);
> List<Person> people = ImmutableList.of(
> new Person("Joe", "Bloggs", 21, "NY")
> );
> Dataset<Person> dataset = sqlContext.createDataset(people,
> Encoders.bean(Person.class));
> {code}
> {code:title=Person.java}
> class Person implements Serializable {
> String first;
> String last;
> int age;
> String state;
> public Person() {
> }
> public Person(String first, String last, int age, String state) {
> this.first = first;
> this.last = last;
> this.age = age;
> this.state = state;
> }
> public String getFirst() {
> return first;
> }
> public String getLast() {
> return last;
> }
> public int getAge() {
> return age;
> }
> public String getState() {
> return state;
> }
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]