[ https://issues.apache.org/jira/browse/SPARK-14096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15252597#comment-15252597 ]
JESSE CHEN commented on SPARK-14096: ------------------------------------ But the simplest workaround is to use the spark.serializer org.apache.spark.serializer.JavaSerializer for now. > SPARK-SQL CLI returns NPE > ------------------------- > > Key: SPARK-14096 > URL: https://issues.apache.org/jira/browse/SPARK-14096 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.0.0 > Reporter: JESSE CHEN > > Trying to run TPCDS query 06 in spark-sql shell received the following error > in the middle of a stage; but running another query 38 succeeded: > NPE: > {noformat} > 16/03/22 15:12:56 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 10.0, > whose tasks have all completed, from pool > 16/03/22 15:12:56 INFO scheduler.TaskSetManager: Finished task 65.0 in stage > 10.0 (TID 622) in 171 ms on localhost (30/200) > 16/03/22 15:12:56 ERROR scheduler.TaskResultGetter: Exception while getting > task result > com.esotericsoftware.kryo.KryoException: java.lang.NullPointerException > Serialization trace: > underlying (org.apache.spark.util.BoundedPriorityQueue) > at > com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:626) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221) > at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732) > at com.twitter.chill.SomeSerializer.read(SomeSerializer.scala:25) > at com.twitter.chill.SomeSerializer.read(SomeSerializer.scala:19) > at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732) > at > org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:312) > at > org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:87) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:66) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:57) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:57) > at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1790) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:56) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.NullPointerException > at > org.apache.spark.sql.catalyst.expressions.codegen.LazilyGeneratedOrdering.compare(GenerateOrdering.scala:157) > at > org.apache.spark.sql.catalyst.expressions.codegen.LazilyGeneratedOrdering.compare(GenerateOrdering.scala:148) > at scala.math.Ordering$$anon$4.compare(Ordering.scala:111) > at java.util.PriorityQueue.siftUpUsingComparator(PriorityQueue.java:669) > at java.util.PriorityQueue.siftUp(PriorityQueue.java:645) > at java.util.PriorityQueue.offer(PriorityQueue.java:344) > at java.util.PriorityQueue.add(PriorityQueue.java:321) > at > com.twitter.chill.java.PriorityQueueSerializer.read(PriorityQueueSerializer.java:78) > at > com.twitter.chill.java.PriorityQueueSerializer.read(PriorityQueueSerializer.java:31) > at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:651) > at > com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:605) > ... 15 more > 16/03/22 15:12:56 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 10.0, > whose tasks have all completed, from pool > 16/03/22 15:12:56 INFO scheduler.TaskSetManager: Finished task 66.0 in stage > 10.0 (TID 623) in 171 ms on localhost (31/200) > 16/03/22 15:12:56 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 10.0, > whose tasks have all completed, from pool > {noformat} > query 06 (caused the above NPE): > {noformat} > select a.ca_state state, count(*) cnt > from customer_address a > join customer c on a.ca_address_sk = c.c_current_addr_sk > join store_sales s on c.c_customer_sk = s.ss_customer_sk > join date_dim d on s.ss_sold_date_sk = d.d_date_sk > join item i on s.ss_item_sk = i.i_item_sk > join (select distinct d_month_seq > from date_dim > where d_year = 2001 > and d_moy = 1 ) tmp1 ON d.d_month_seq = tmp1.d_month_seq > join > (select j.i_category, avg(j.i_current_price) as avg_i_current_price > from item j group by j.i_category) tmp2 on tmp2.i_category = > i.i_category > where > i.i_current_price > 1.2 * tmp2.avg_i_current_price > group by a.ca_state > having count(*) >= 10 > order by cnt > limit 100; > {noformat} > query 38 (succeeded) > {noformat} > select count(*) from ( > select distinct c_last_name, c_first_name, d_date > from store_sales, date_dim, customer > where store_sales.ss_sold_date_sk = date_dim.d_date_sk > and store_sales.ss_customer_sk = customer.c_customer_sk > and d_month_seq between 1200 and 1200 + 11 > intersect > select distinct c_last_name, c_first_name, d_date > from catalog_sales, date_dim, customer > where catalog_sales.cs_sold_date_sk = date_dim.d_date_sk > and catalog_sales.cs_bill_customer_sk = customer.c_customer_sk > and d_month_seq between 1200 and 1200 + 11 > intersect > select distinct c_last_name, c_first_name, d_date > from web_sales, date_dim, customer > where web_sales.ws_sold_date_sk = date_dim.d_date_sk > and web_sales.ws_bill_customer_sk = customer.c_customer_sk > and d_month_seq between 1200 and 1200 + 11 > ) hot_cust > limit 100; > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org