Yin Huai created SPARK-5789:
-------------------------------
Summary: Throw a better error message if JsonRDD.parseJson
encounters unrecoverable parsing errors.
Key: SPARK-5789
URL: https://issues.apache.org/jira/browse/SPARK-5789
Project: Spark
Issue Type: Improvement
Components: SQL
Reporter: Yin Huai
For example
{code}
sqlContext.jsonRDD(sc.parallelize(""""a":1}"""::Nil))
{code}
will throw
{code}
scala.MatchError: a (of class java.lang.String)
at
org.apache.spark.sql.json.JsonRDD$$anonfun$parseJson$1$$anonfun$apply$2.apply(JsonRDD.scala:302)
at
org.apache.spark.sql.json.JsonRDD$$anonfun$parseJson$1$$anonfun$apply$2.apply(JsonRDD.scala:300)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at org.apache.spark.rdd.RDD$$anonfun$19.apply(RDD.scala:879)
at org.apache.spark.rdd.RDD$$anonfun$19.apply(RDD.scala:878)
at
org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1516)
at
org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1516)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:197)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
15/02/12 15:08:55 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 4.0
(TID 26) in 10 ms on localhost (7/8)
15/02/12 15:08:55 WARN scheduler.TaskSetManager: Lost task 7.0 in stage 4.0
(TID 33, localhost): scala.MatchError: a (of class java.lang.String)
at
org.apache.spark.sql.json.JsonRDD$$anonfun$parseJson$1$$anonfun$apply$2.apply(JsonRDD.scala:302)
at
org.apache.spark.sql.json.JsonRDD$$anonfun$parseJson$1$$anonfun$apply$2.apply(JsonRDD.scala:300)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at org.apache.spark.rdd.RDD$$anonfun$19.apply(RDD.scala:879)
at org.apache.spark.rdd.RDD$$anonfun$19.apply(RDD.scala:878)
at
org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1516)
at
org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1516)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:197)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]