[
https://issues.apache.org/jira/browse/SPARK-17952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Amit Baghel updated SPARK-17952:
--------------------------------
Description:
As per latest spark documentation for Java at
http://spark.apache.org/docs/latest/sql-programming-guide.html#inferring-the-schema-using-reflection,
{quote}
"Nested JavaBeans and List or Array fields are supported though".
{quote}
However nested JavaBean is not working. Please see the below code.
SubCategory class
{code}
public class SubCategory implements Serializable{
private String id;
private String name;
public String getId() {
return id;
}
public void setId(String id) {
this.id = id;
}
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
}
{code}
Category class
{code}
public class Category implements Serializable{
private String id;
private SubCategory subCategory;
public String getId() {
return id;
}
public void setId(String id) {
this.id = id;
}
public SubCategory getSubCategory() {
return subCategory;
}
public void setSubCategory(SubCategory subCategory) {
this.subCategory = subCategory;
}
}
{code}
SparkSample class
{code}
public class SparkSample {
public static void main(String[] args) throws IOException {
SparkSession spark = SparkSession
.builder()
.appName("SparkSample")
.master("local")
.getOrCreate();
//SubCategory
SubCategory sub = new SubCategory();
sub.setId("sc-111");
sub.setName("Sub-1");
//Category
Category category = new Category();
category.setId("s-111");
category.setSubCategory(sub);
//categoryList
List<Category> categoryList = new ArrayList<Category>();
categoryList.add(category);
//DF
Dataset<Row> dframe = spark.createDataFrame(categoryList,
Category.class);
dframe.show();
}
}
{code}
Above code throws below error.
{code}
Exception in thread "main" scala.MatchError: com.sample.SubCategory@e7391d (of
class com.sample.SubCategory)
at
org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:256)
at
org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:251)
at
org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:103)
at
org.apache.spark.sql.catalyst.CatalystTypeConverters$$anonfun$createToCatalystConverter$2.apply(CatalystTypeConverters.scala:403)
at
org.apache.spark.sql.SQLContext$$anonfun$beansToRows$1$$anonfun$apply$1.apply(SQLContext.scala:1106)
at
org.apache.spark.sql.SQLContext$$anonfun$beansToRows$1$$anonfun$apply$1.apply(SQLContext.scala:1106)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
at
org.apache.spark.sql.SQLContext$$anonfun$beansToRows$1.apply(SQLContext.scala:1106)
at
org.apache.spark.sql.SQLContext$$anonfun$beansToRows$1.apply(SQLContext.scala:1104)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$class.toStream(Iterator.scala:1322)
at scala.collection.AbstractIterator.toStream(Iterator.scala:1336)
at
scala.collection.TraversableOnce$class.toSeq(TraversableOnce.scala:298)
at scala.collection.AbstractIterator.toSeq(Iterator.scala:1336)
at
org.apache.spark.sql.SparkSession.createDataFrame(SparkSession.scala:373)
at com.sample.SparkSample.main(SparkSample.java:33)
{code}
createDataFrame method throws above exception. But I observed that
createDataset method works fine with below code.
{code}
Encoder<Category> encoder = Encoders.bean(Category.class);
Dataset<Category> dframe = spark.createDataset(categoryList, encoder);
dframe.show();
{code}
was:
As per latest spark documentation for Java at
http://spark.apache.org/docs/latest/sql-programming-guide.html#inferring-the-schema-using-reflection,
"Nested JavaBeans and List or Array fields are supported though". However
nested JavaBean is not working. Please see the below code.
SubCategory class
{code}
public class SubCategory implements Serializable{
private String id;
private String name;
public String getId() {
return id;
}
public void setId(String id) {
this.id = id;
}
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
}
{code}
Category class
{code}
public class Category implements Serializable{
private String id;
private SubCategory subCategory;
public String getId() {
return id;
}
public void setId(String id) {
this.id = id;
}
public SubCategory getSubCategory() {
return subCategory;
}
public void setSubCategory(SubCategory subCategory) {
this.subCategory = subCategory;
}
}
{code}
SparkSample class
{code}
public class SparkSample {
public static void main(String[] args) throws IOException {
SparkSession spark = SparkSession
.builder()
.appName("SparkSample")
.master("local")
.getOrCreate();
//SubCategory
SubCategory sub = new SubCategory();
sub.setId("sc-111");
sub.setName("Sub-1");
//Category
Category category = new Category();
category.setId("s-111");
category.setSubCategory(sub);
//categoryList
List<Category> categoryList = new ArrayList<Category>();
categoryList.add(category);
//DF
Dataset<Row> dframe = spark.createDataFrame(categoryList,
Category.class);
dframe.show();
}
}
{code}
Above code throws below error.
{code}
Exception in thread "main" scala.MatchError: com.sample.SubCategory@e7391d (of
class com.sample.SubCategory)
at
org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:256)
at
org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:251)
at
org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:103)
at
org.apache.spark.sql.catalyst.CatalystTypeConverters$$anonfun$createToCatalystConverter$2.apply(CatalystTypeConverters.scala:403)
at
org.apache.spark.sql.SQLContext$$anonfun$beansToRows$1$$anonfun$apply$1.apply(SQLContext.scala:1106)
at
org.apache.spark.sql.SQLContext$$anonfun$beansToRows$1$$anonfun$apply$1.apply(SQLContext.scala:1106)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
at
org.apache.spark.sql.SQLContext$$anonfun$beansToRows$1.apply(SQLContext.scala:1106)
at
org.apache.spark.sql.SQLContext$$anonfun$beansToRows$1.apply(SQLContext.scala:1104)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$class.toStream(Iterator.scala:1322)
at scala.collection.AbstractIterator.toStream(Iterator.scala:1336)
at
scala.collection.TraversableOnce$class.toSeq(TraversableOnce.scala:298)
at scala.collection.AbstractIterator.toSeq(Iterator.scala:1336)
at
org.apache.spark.sql.SparkSession.createDataFrame(SparkSession.scala:373)
at com.sample.SparkSample.main(SparkSample.java:33)
{code}
createDataFrame method throws above exception. But I observed that
createDataset method works fine with below code.
{code}
Encoder<Category> encoder = Encoders.bean(Category.class);
Dataset<Category> dframe = spark.createDataset(categoryList, encoder);
dframe.show();
{code}
> Java SparkSession createDataFrame method throws exception for nested JavaBeans
> ------------------------------------------------------------------------------
>
> Key: SPARK-17952
> URL: https://issues.apache.org/jira/browse/SPARK-17952
> Project: Spark
> Issue Type: Bug
> Affects Versions: 2.0.0, 2.0.1
> Reporter: Amit Baghel
>
> As per latest spark documentation for Java at
> http://spark.apache.org/docs/latest/sql-programming-guide.html#inferring-the-schema-using-reflection,
>
> {quote}
> "Nested JavaBeans and List or Array fields are supported though".
> {quote}
> However nested JavaBean is not working. Please see the below code.
> SubCategory class
> {code}
> public class SubCategory implements Serializable{
> private String id;
> private String name;
>
> public String getId() {
> return id;
> }
> public void setId(String id) {
> this.id = id;
> }
> public String getName() {
> return name;
> }
> public void setName(String name) {
> this.name = name;
> }
> }
> {code}
> Category class
> {code}
> public class Category implements Serializable{
> private String id;
> private SubCategory subCategory;
>
> public String getId() {
> return id;
> }
> public void setId(String id) {
> this.id = id;
> }
> public SubCategory getSubCategory() {
> return subCategory;
> }
> public void setSubCategory(SubCategory subCategory) {
> this.subCategory = subCategory;
> }
> }
> {code}
> SparkSample class
> {code}
> public class SparkSample {
> public static void main(String[] args) throws IOException {
>
> SparkSession spark = SparkSession
> .builder()
> .appName("SparkSample")
> .master("local")
> .getOrCreate();
> //SubCategory
> SubCategory sub = new SubCategory();
> sub.setId("sc-111");
> sub.setName("Sub-1");
> //Category
> Category category = new Category();
> category.setId("s-111");
> category.setSubCategory(sub);
> //categoryList
> List<Category> categoryList = new ArrayList<Category>();
> categoryList.add(category);
> //DF
> Dataset<Row> dframe = spark.createDataFrame(categoryList,
> Category.class);
> dframe.show();
> }
> }
> {code}
> Above code throws below error.
> {code}
> Exception in thread "main" scala.MatchError: com.sample.SubCategory@e7391d
> (of class com.sample.SubCategory)
> at
> org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:256)
> at
> org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:251)
> at
> org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:103)
> at
> org.apache.spark.sql.catalyst.CatalystTypeConverters$$anonfun$createToCatalystConverter$2.apply(CatalystTypeConverters.scala:403)
> at
> org.apache.spark.sql.SQLContext$$anonfun$beansToRows$1$$anonfun$apply$1.apply(SQLContext.scala:1106)
> at
> org.apache.spark.sql.SQLContext$$anonfun$beansToRows$1$$anonfun$apply$1.apply(SQLContext.scala:1106)
> at
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
> at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
> at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
> at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
> at
> org.apache.spark.sql.SQLContext$$anonfun$beansToRows$1.apply(SQLContext.scala:1106)
> at
> org.apache.spark.sql.SQLContext$$anonfun$beansToRows$1.apply(SQLContext.scala:1104)
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
> at scala.collection.Iterator$class.toStream(Iterator.scala:1322)
> at scala.collection.AbstractIterator.toStream(Iterator.scala:1336)
> at
> scala.collection.TraversableOnce$class.toSeq(TraversableOnce.scala:298)
> at scala.collection.AbstractIterator.toSeq(Iterator.scala:1336)
> at
> org.apache.spark.sql.SparkSession.createDataFrame(SparkSession.scala:373)
> at com.sample.SparkSample.main(SparkSample.java:33)
> {code}
> createDataFrame method throws above exception. But I observed that
> createDataset method works fine with below code.
> {code}
> Encoder<Category> encoder = Encoders.bean(Category.class);
> Dataset<Category> dframe = spark.createDataset(categoryList, encoder);
> dframe.show();
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]