[
https://issues.apache.org/jira/browse/SPARK-9033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Pavel updated SPARK-9033:
-------------------------
Description:
I've a java.util.Map<String, String> field in a POJO class and I'm trying to
use it to createDataFrame (1.3.1) / applySchema(1.2.2) with the SQLContext and
getting following error in both 1.2.2 & 1.3.1 versions of the Spark SQL:
*sample code:
SQLContext sqlCtx = new SQLContext(sc.sc());
JavaRDD<Event> rdd = sc.textFile("/path").map(line-> Event.fromString(line));
//text line is splitted and assigned to respective field of the event class here
DataFrame schemaRDD = sqlCtx.createDataFrame(rdd, Event.class); <-- error
thrown here
schemaRDD.registerTempTable("events");
Event class is a Serializable containing a field of type java.util.Map<String,
String>. This issue occurs also with Spark streaming when used with SQL.
JavaDStream<String> receiverStream = jssc.receiverStream(new
StreamingReceiver());
JavaDStream<String> windowDStream = receiverStream.window(WINDOW_LENGTH,
SLIDE_INTERVAL);
jssc.checkpoint("event-streaming");
windowDStream.foreachRDD(evRDD -> {
if(evRDD.count() == 0) return null;
DataFrame schemaRDD = sqlCtx.createDataFrame(evRDD, Event.class);
schemaRDD.registerTempTable("events");
...
}
*error:
scala.MatchError: interface java.util.Map (of class java.lang.Class)
at
org.apache.spark.sql.SQLContext$$anonfun$getSchema$1.apply(SQLContext.scala:1193)
~[spark-sql_2.10-1.3.1.jar:1.3.1]
at
org.apache.spark.sql.SQLContext$$anonfun$getSchema$1.apply(SQLContext.scala:1192)
~[spark-sql_2.10-1.3.1.jar:1.3.1]
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
~[scala-library-2.10.5.jar:na]
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
~[scala-library-2.10.5.jar:na]
at
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
~[scala-library-2.10.5.jar:na]
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
~[scala-library-2.10.5.jar:na]
at
scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
~[scala-library-2.10.5.jar:na]
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
~[scala-library-2.10.5.jar:na]
at org.apache.spark.sql.SQLContext.getSchema(SQLContext.scala:1192)
~[spark-sql_2.10-1.3.1.jar:1.3.1]
at
org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:437)
~[spark-sql_2.10-1.3.1.jar:1.3.1]
at
org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:465)
~[spark-sql_2.10-1.3.1.jar:1.3.1]
**also this occurs for fields of custom POJO classes:
scala.MatchError: class com.test.MyClass (of class java.lang.Class)
at
org.apache.spark.sql.SQLContext$$anonfun$getSchema$1.apply(SQLContext.scala:1193)
~[spark-sql_2.10-1.3.1.jar:1.3.1]
at
org.apache.spark.sql.SQLContext$$anonfun$getSchema$1.apply(SQLContext.scala:1192)
~[spark-sql_2.10-1.3.1.jar:1.3.1]
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
~[scala-library-2.10.5.jar:na]
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
~[scala-library-2.10.5.jar:na]
at
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
~[scala-library-2.10.5.jar:na]
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
~[scala-library-2.10.5.jar:na]
at
scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
~[scala-library-2.10.5.jar:na]
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
~[scala-library-2.10.5.jar:na]
at org.apache.spark.sql.SQLContext.getSchema(SQLContext.scala:1192)
~[spark-sql_2.10-1.3.1.jar:1.3.1]
at
org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:437)
~[spark-sql_2.10-1.3.1.jar:1.3.1]
at
org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:465)
~[spark-sql_2.10-1.3.1.jar:1.3.1]
**also occurs for Calendar type:
scala.MatchError: class java.util.Calendar (of class java.lang.Class)
at
org.apache.spark.sql.SQLContext$$anonfun$getSchema$1.apply(SQLContext.scala:1193)
~[spark-sql_2.10-1.3.1.jar:1.3.1]
at
org.apache.spark.sql.SQLContext$$anonfun$getSchema$1.apply(SQLContext.scala:1192)
~[spark-sql_2.10-1.3.1.jar:1.3.1]
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
~[scala-library-2.10.5.jar:na]
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
~[scala-library-2.10.5.jar:na]
at
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
~[scala-library-2.10.5.jar:na]
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
~[scala-library-2.10.5.jar:na]
at
scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
~[scala-library-2.10.5.jar:na]
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
~[scala-library-2.10.5.jar:na]
at org.apache.spark.sql.SQLContext.getSchema(SQLContext.scala:1192)
~[spark-sql_2.10-1.3.1.jar:1.3.1]
at
org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:437)
~[spark-sql_2.10-1.3.1.jar:1.3.1]
at
org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:465)
~[spark-sql_2.10-1.3.1.jar:1.3.1]
was:
I've a java.util.Map<String, String> field in a POJO class and I'm trying to
use it to createDataFrame (1.3.1) / applySchema(1.2.2) with the SQLContext and
getting following error in both 1.2.2 & 1.3.1 versions of the Spark SQL:
*sample code:
SQLContext sqlCtx = new SQLContext(sc.sc());
JavaRDD<Event> rdd = sc.textFile("/path").map(line-> Event.fromString(line));
//text line is splitted and assigned to respective field of the event class here
DataFrame schemaRDD = sqlCtx.createDataFrame(rdd, Event.class); <-- error
thrown here
schemaRDD.registerTempTable("events");
Event class is a Serializable containing a field of type java.util.Map<String,
String>. This issue occurs also with Spark streaming when used with SQL.
JavaDStream<String> receiverStream = jssc.receiverStream(new
StreamingReceiver());
JavaDStream<String> windowDStream = receiverStream.window(WINDOW_LENGTH,
SLIDE_INTERVAL);
jssc.checkpoint("event-streaming");
windowDStream.foreachRDD(evRDD -> {
if(evRDD.count() == 0) return null;
DataFrame schemaRDD = sqlCtx.createDataFrame(evRDD, Event.class);
schemaRDD.registerTempTable("events");
...
}
*error:
scala.MatchError: interface java.util.Map (of class java.lang.Class)
at
org.apache.spark.sql.SQLContext$$anonfun$getSchema$1.apply(SQLContext.scala:1193)
~[spark-sql_2.10-1.3.1.jar:1.3.1]
at
org.apache.spark.sql.SQLContext$$anonfun$getSchema$1.apply(SQLContext.scala:1192)
~[spark-sql_2.10-1.3.1.jar:1.3.1]
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
~[scala-library-2.10.5.jar:na]
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
~[scala-library-2.10.5.jar:na]
at
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
~[scala-library-2.10.5.jar:na]
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
~[scala-library-2.10.5.jar:na]
at
scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
~[scala-library-2.10.5.jar:na]
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
~[scala-library-2.10.5.jar:na]
at org.apache.spark.sql.SQLContext.getSchema(SQLContext.scala:1192)
~[spark-sql_2.10-1.3.1.jar:1.3.1]
at
org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:437)
~[spark-sql_2.10-1.3.1.jar:1.3.1]
at
org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:465)
~[spark-sql_2.10-1.3.1.jar:1.3.1]
also this occurs for fields of custom POJO classes:
scala.MatchError: class com.test.MyClass (of class java.lang.Class)
at
org.apache.spark.sql.SQLContext$$anonfun$getSchema$1.apply(SQLContext.scala:1193)
~[spark-sql_2.10-1.3.1.jar:1.3.1]
at
org.apache.spark.sql.SQLContext$$anonfun$getSchema$1.apply(SQLContext.scala:1192)
~[spark-sql_2.10-1.3.1.jar:1.3.1]
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
~[scala-library-2.10.5.jar:na]
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
~[scala-library-2.10.5.jar:na]
at
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
~[scala-library-2.10.5.jar:na]
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
~[scala-library-2.10.5.jar:na]
at
scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
~[scala-library-2.10.5.jar:na]
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
~[scala-library-2.10.5.jar:na]
at org.apache.spark.sql.SQLContext.getSchema(SQLContext.scala:1192)
~[spark-sql_2.10-1.3.1.jar:1.3.1]
at
org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:437)
~[spark-sql_2.10-1.3.1.jar:1.3.1]
at
org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:465)
~[spark-sql_2.10-1.3.1.jar:1.3.1]
also occurs for Calendar type:
scala.MatchError: class java.util.Calendar (of class java.lang.Class)
at
org.apache.spark.sql.SQLContext$$anonfun$getSchema$1.apply(SQLContext.scala:1193)
~[spark-sql_2.10-1.3.1.jar:1.3.1]
at
org.apache.spark.sql.SQLContext$$anonfun$getSchema$1.apply(SQLContext.scala:1192)
~[spark-sql_2.10-1.3.1.jar:1.3.1]
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
~[scala-library-2.10.5.jar:na]
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
~[scala-library-2.10.5.jar:na]
at
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
~[scala-library-2.10.5.jar:na]
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
~[scala-library-2.10.5.jar:na]
at
scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
~[scala-library-2.10.5.jar:na]
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
~[scala-library-2.10.5.jar:na]
at org.apache.spark.sql.SQLContext.getSchema(SQLContext.scala:1192)
~[spark-sql_2.10-1.3.1.jar:1.3.1]
at
org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:437)
~[spark-sql_2.10-1.3.1.jar:1.3.1]
at
org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:465)
~[spark-sql_2.10-1.3.1.jar:1.3.1]
> scala.MatchError: interface java.util.Map (of class java.lang.Class) with
> Spark SQL
> -----------------------------------------------------------------------------------
>
> Key: SPARK-9033
> URL: https://issues.apache.org/jira/browse/SPARK-9033
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 1.2.2, 1.3.1
> Reporter: Pavel
>
> I've a java.util.Map<String, String> field in a POJO class and I'm trying to
> use it to createDataFrame (1.3.1) / applySchema(1.2.2) with the SQLContext
> and getting following error in both 1.2.2 & 1.3.1 versions of the Spark SQL:
> *sample code:
> SQLContext sqlCtx = new SQLContext(sc.sc());
> JavaRDD<Event> rdd = sc.textFile("/path").map(line-> Event.fromString(line));
> //text line is splitted and assigned to respective field of the event class
> here
> DataFrame schemaRDD = sqlCtx.createDataFrame(rdd, Event.class); <-- error
> thrown here
> schemaRDD.registerTempTable("events");
> Event class is a Serializable containing a field of type
> java.util.Map<String, String>. This issue occurs also with Spark streaming
> when used with SQL.
> JavaDStream<String> receiverStream = jssc.receiverStream(new
> StreamingReceiver());
> JavaDStream<String> windowDStream = receiverStream.window(WINDOW_LENGTH,
> SLIDE_INTERVAL);
> jssc.checkpoint("event-streaming");
> windowDStream.foreachRDD(evRDD -> {
> if(evRDD.count() == 0) return null;
> DataFrame schemaRDD = sqlCtx.createDataFrame(evRDD, Event.class);
> schemaRDD.registerTempTable("events");
> ...
> }
> *error:
> scala.MatchError: interface java.util.Map (of class java.lang.Class)
> at
> org.apache.spark.sql.SQLContext$$anonfun$getSchema$1.apply(SQLContext.scala:1193)
> ~[spark-sql_2.10-1.3.1.jar:1.3.1]
> at
> org.apache.spark.sql.SQLContext$$anonfun$getSchema$1.apply(SQLContext.scala:1192)
> ~[spark-sql_2.10-1.3.1.jar:1.3.1]
> at
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> ~[scala-library-2.10.5.jar:na]
> at
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> ~[scala-library-2.10.5.jar:na]
> at
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
> ~[scala-library-2.10.5.jar:na]
> at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
> ~[scala-library-2.10.5.jar:na]
> at
> scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
> ~[scala-library-2.10.5.jar:na]
> at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
> ~[scala-library-2.10.5.jar:na]
> at org.apache.spark.sql.SQLContext.getSchema(SQLContext.scala:1192)
> ~[spark-sql_2.10-1.3.1.jar:1.3.1]
> at
> org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:437)
> ~[spark-sql_2.10-1.3.1.jar:1.3.1]
> at
> org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:465)
> ~[spark-sql_2.10-1.3.1.jar:1.3.1]
> **also this occurs for fields of custom POJO classes:
> scala.MatchError: class com.test.MyClass (of class java.lang.Class)
> at
> org.apache.spark.sql.SQLContext$$anonfun$getSchema$1.apply(SQLContext.scala:1193)
> ~[spark-sql_2.10-1.3.1.jar:1.3.1]
> at
> org.apache.spark.sql.SQLContext$$anonfun$getSchema$1.apply(SQLContext.scala:1192)
> ~[spark-sql_2.10-1.3.1.jar:1.3.1]
> at
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> ~[scala-library-2.10.5.jar:na]
> at
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> ~[scala-library-2.10.5.jar:na]
> at
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
> ~[scala-library-2.10.5.jar:na]
> at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
> ~[scala-library-2.10.5.jar:na]
> at
> scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
> ~[scala-library-2.10.5.jar:na]
> at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
> ~[scala-library-2.10.5.jar:na]
> at org.apache.spark.sql.SQLContext.getSchema(SQLContext.scala:1192)
> ~[spark-sql_2.10-1.3.1.jar:1.3.1]
> at
> org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:437)
> ~[spark-sql_2.10-1.3.1.jar:1.3.1]
> at
> org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:465)
> ~[spark-sql_2.10-1.3.1.jar:1.3.1]
> **also occurs for Calendar type:
> scala.MatchError: class java.util.Calendar (of class java.lang.Class)
> at
> org.apache.spark.sql.SQLContext$$anonfun$getSchema$1.apply(SQLContext.scala:1193)
> ~[spark-sql_2.10-1.3.1.jar:1.3.1]
> at
> org.apache.spark.sql.SQLContext$$anonfun$getSchema$1.apply(SQLContext.scala:1192)
> ~[spark-sql_2.10-1.3.1.jar:1.3.1]
> at
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> ~[scala-library-2.10.5.jar:na]
> at
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> ~[scala-library-2.10.5.jar:na]
> at
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
> ~[scala-library-2.10.5.jar:na]
> at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
> ~[scala-library-2.10.5.jar:na]
> at
> scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
> ~[scala-library-2.10.5.jar:na]
> at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
> ~[scala-library-2.10.5.jar:na]
> at org.apache.spark.sql.SQLContext.getSchema(SQLContext.scala:1192)
> ~[spark-sql_2.10-1.3.1.jar:1.3.1]
> at
> org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:437)
> ~[spark-sql_2.10-1.3.1.jar:1.3.1]
> at
> org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:465)
> ~[spark-sql_2.10-1.3.1.jar:1.3.1]
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]