Hi Community,
I use Spark 1.0.2, using Spark SQL to do Hive SQL.
When I run the following code in Spark Shell:
val file = sc.textFile("./README.md")
val count = file.flatMap(line => line.split(" ")).map(word => (word,
1)).reduceByKey(_+_)
count.collect()
Correct and no error!
When I run the following code:
val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
hiveContext.hql("SHOW TABLES").collect().foreach(println)
Correct and no error!
But when I run:
val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
hiveContext.hql("SELECT COUNT(*) from uservisits").collect().foreach(println)
It comes with some error messages.
What I found was the following error:
14/10/09 19:47:34 ERROR Executor: Exception in task ID 4
java.lang.NullPointerException at
org.apache.spark.rdd.RDD$$anonfun$15.apply(RDD.scala:594) at
org.apache.spark.rdd.RDD$$anonfun$15.apply(RDD.scala:594) at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at
org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111) at
org.apache.spark.scheduler.Task.run(Task.scala:51) at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183) at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745) 14/10/09 19:47:34 INFO
CoarseGrainedExecutorBackend: Got assigned task 5 14/10/09 19:47:34 INFO
Executor: Running task ID 5 14/10/09 19:47:34 DEBUG BlockManager: Getting local
block broadcast_1 14/10/09 19:47:34 DEBUG BlockManager: Level for block
broadcast_1 is StorageLevel(true, true, false, true, 1) 14/10/09 19:47:34 DEBUG
BlockManager: Getting block broadcast_1 from memory 14/10/09 19:47:34 INFO
BlockManager: Found block broadcast_1 locally 14/10/09 19:47:34 INFO
BlockFetcherIterator$BasicBlockFetcherIterator: maxBytesInFlight: 50331648,
targetRequestSize: 10066329 14/10/09 19:47:34 INFO
BlockFetcherIterator$BasicBlockFetcherIterator: Getting 2 non-empty blocks out
of 2 blocks 14/10/09 19:47:34 DEBUG
BlockFetcherIterator$BasicBlockFetcherIterator: Sending request for 2 blocks
(2.5 KB) from node19:50868 14/10/09 19:47:34 DEBUG BlockMessageArray: Adding
BlockMessage [type = 1, id = shuffle_0_0_1, level = null, data = null] 14/10/09
19:47:34 DEBUG BlockMessageArray: Added BufferMessage(id = 5, size = 34)
14/10/09 19:47:34 DEBUG BlockMessageArray: Adding BlockMessage [type = 1, id =
shuffle_0_1_1, level = null, data = null] 14/10/09 19:47:34 DEBUG
BlockMessageArray: Added BufferMessage(id = 6, size = 34) 14/10/09 19:47:34
DEBUG BlockMessageArray: Buffer list: 14/10/09 19:47:34 DEBUG
BlockMessageArray: java.nio.HeapByteBuffer[pos=0 lim=4 cap=4] 14/10/09 19:47:34
DEBUG BlockMessageArray: java.nio.HeapByteBuffer[pos=0 lim=34 cap=34] 14/10/09
19:47:34 DEBUG BlockMessageArray: java.nio.HeapByteBuffer[pos=0 lim=4 cap=4]
14/10/09 19:47:34 DEBUG BlockMessageArray: java.nio.HeapByteBuffer[pos=0 lim=34
cap=34] 14/10/09 19:47:34 INFO BlockFetcherIterator$BasicBlockFetcherIterator:
Started 1 remote fetches in 2 ms 14/10/09 19:47:34 DEBUG
BlockFetcherIterator$BasicBlockFetcherIterator: Got local blocks in 0 ms ms
14/10/09 19:47:34 ERROR Executor: Exception in task ID 5
java.lang.NullPointerException at
org.apache.spark.rdd.RDD$$anonfun$15.apply(RDD.scala:594) at
org.apache.spark.rdd.RDD$$anonfun$15.apply(RDD.scala:594) at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at
org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111) at
org.apache.spark.scheduler.Task.run(Task.scala:51) at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183) at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745) 14/10/09 19:47:34 INFO
CoarseGrainedExecutorBackend: Got assigned task 6 14/10/09 19:47:34 INFO
Executor: Running task ID 6 14/10/09 19:47:34 DEBUG BlockManager: Getting local
block broadcast_1 14/10/09 19:47:34 DEBUG BlockManager: Level for block
broadcast_1 is StorageLevel(true, true, false, true, 1) 14/10/09 19:47:34 DEBUG
BlockManager: Getting block broadcast_1 from memory 14/10/09 19:47:34 INFO
BlockManager: Found block broadcast_1 locally 14/10/09 19:47:34 INFO
BlockFetcherIterator$BasicBlockFetcherIterator: maxBytesInFlight: 50331648,
targetRequestSize: 10066329 14/10/09 19:47:34 INFO
BlockFetcherIterator$BasicBlockFetcherIterator: Getting 2 non-empty blocks out
of 2 blocks 14/10/09 19:47:34 DEBUG
BlockFetcherIterator$BasicBlockFetcherIterator: Sending request for 2 blocks
(2.5 KB) from node19:50868 14/10/09 19:47:34 DEBUG BlockMessageArray: Adding
BlockMessage [type = 1, id = shuffle_0_0_1, level = null, data = null] 14/10/09
19:47:34 DEBUG BlockMessageArray: Added BufferMessage(id = 8, size = 34)
14/10/09 19:47:34 DEBUG BlockMessageArray: Adding BlockMessage [type = 1, id =
shuffle_0_1_1, level = null, data = null] 14/10/09 19:47:34 DEBUG
BlockMessageArray: Added BufferMessage(id = 9, size = 34) 14/10/09 19:47:34
DEBUG BlockMessageArray: Buffer list: 14/10/09 19:47:34 DEBUG
BlockMessageArray: java.nio.HeapByteBuffer[pos=0 lim=4 cap=4] 14/10/09 19:47:34
DEBUG BlockMessageArray: java.nio.HeapByteBuffer[pos=0 lim=34 cap=34] 14/10/09
19:47:34 DEBUG BlockMessageArray: java.nio.HeapByteBuffer[pos=0 lim=4 cap=4]
14/10/09 19:47:34 DEBUG BlockMessageArray: java.nio.HeapByteBuffer[pos=0 lim=34
cap=34] 14/10/09 19:47:34 INFO BlockFetcherIterator$BasicBlockFetcherIterator:
Started 1 remote fetches in 2 ms 14/10/09 19:47:34 DEBUG
BlockFetcherIterator$BasicBlockFetcherIterator: Got local blocks in 0 ms ms
14/10/09 19:47:34 ERROR Executor: Exception in task ID 6
java.lang.NullPointerException at
org.apache.spark.rdd.RDD$$anonfun$15.apply(RDD.scala:594) at
org.apache.spark.rdd.RDD$$anonfun$15.apply(RDD.scala:594) at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at
org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111) at
org.apache.spark.scheduler.Task.run(Task.scala:51) at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183) at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
What can contribute to this? Is it a known problem?
Chen Weikeng