Yin Huai created SPARK-5789:
-------------------------------

             Summary: Throw a better error message if JsonRDD.parseJson 
encounters unrecoverable parsing errors.
                 Key: SPARK-5789
                 URL: https://issues.apache.org/jira/browse/SPARK-5789
             Project: Spark
          Issue Type: Improvement
          Components: SQL
            Reporter: Yin Huai


For example
{code}
sqlContext.jsonRDD(sc.parallelize(""""a":1}"""::Nil))
{code}
will throw
{code}
scala.MatchError: a (of class java.lang.String)
        at 
org.apache.spark.sql.json.JsonRDD$$anonfun$parseJson$1$$anonfun$apply$2.apply(JsonRDD.scala:302)
        at 
org.apache.spark.sql.json.JsonRDD$$anonfun$parseJson$1$$anonfun$apply$2.apply(JsonRDD.scala:300)
        at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
        at org.apache.spark.rdd.RDD$$anonfun$19.apply(RDD.scala:879)
        at org.apache.spark.rdd.RDD$$anonfun$19.apply(RDD.scala:878)
        at 
org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1516)
        at 
org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1516)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
        at org.apache.spark.scheduler.Task.run(Task.scala:64)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:197)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
15/02/12 15:08:55 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 4.0 
(TID 26) in 10 ms on localhost (7/8)
15/02/12 15:08:55 WARN scheduler.TaskSetManager: Lost task 7.0 in stage 4.0 
(TID 33, localhost): scala.MatchError: a (of class java.lang.String)
        at 
org.apache.spark.sql.json.JsonRDD$$anonfun$parseJson$1$$anonfun$apply$2.apply(JsonRDD.scala:302)
        at 
org.apache.spark.sql.json.JsonRDD$$anonfun$parseJson$1$$anonfun$apply$2.apply(JsonRDD.scala:300)
        at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
        at org.apache.spark.rdd.RDD$$anonfun$19.apply(RDD.scala:879)
        at org.apache.spark.rdd.RDD$$anonfun$19.apply(RDD.scala:878)
        at 
org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1516)
        at 
org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1516)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
        at org.apache.spark.scheduler.Task.run(Task.scala:64)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:197)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to