Ihor Bobak created SPARK-7603:
---------------------------------

             Summary: Crash of thrift server when doing SQL without "limit"
                 Key: SPARK-7603
                 URL: https://issues.apache.org/jira/browse/SPARK-7603
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 1.3.1
         Environment: Hortonworks Sandbox 2.1  with Spark 1.3.1
            Reporter: Ihor Bobak


I have 2 tables in hive: one with 120 thousand records, another one is 5 times 
smaller. 

I'm running a standalone cluster on single VM, and the thrift server with 
./start-thriftserver.sh --conf spark.executor.memory=2048m  --conf 
spark.driver.memory=1024m
command. 

My spark-defaults.conf contains:
spark.master                     spark://sandbox.hortonworks.com:7077
spark.eventLog.enabled           true
spark.eventLog.dir               
hdfs://sandbox.hortonworks.com:8020/user/pdi/spark/logs


So, when I am running SQL 

select <some fields from header>, <some fields from details>
from  
        vw_salesorderdetail as d 
        left join vw_salesorderheader as h on h.SalesOrderID = d.SalesOrderID 
limit 2000000000;

everything is fine, no matter that the limit is unreal (again: the resultset 
returned is just 120000 records).

But if I am running the same query without limit clause - I get hanging of 
execution - see here: http://postimg.org/image/fujdjd16f/42945a78/

and a lot of exceptions in the logs of thrift server - here you are:

15/05/13 17:59:27 INFO TaskSetManager: Starting task 158.0 in stage 48.0 (TID 
953, sandbox.hortonworks.com, PROCESS_LOCAL, 1473 bytes)
15/05/13 18:00:01 INFO TaskSetManager: Finished task 150.0 in stage 48.0 (TID 
945) in 36166 ms on sandbox.hortonworks.com (152/200)
15/05/13 18:00:02 ERROR Utils: Uncaught exception in thread Spark Context 
Cleaner
java.lang.OutOfMemoryError: GC overhead limit exceeded
        at 
org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:147)
        at 
org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply(ContextCleaner.scala:144)
        at 
org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply(ContextCleaner.scala:144)
        at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1618)
        at 
org.apache.spark.ContextCleaner.org$apache$spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:143)
        at org.apache.spark.ContextCleaner$$anon$3.run(ContextCleaner.scala:65)
Exception in thread "Spark Context Cleaner" 15/05/13 18:00:02 ERROR Utils: 
Uncaught exception in thread task-result-getter-1
java.lang.OutOfMemoryError: GC overhead limit exceeded
        at java.lang.String.<init>(String.java:315)
        at com.esotericsoftware.kryo.io.Input.readAscii(Input.java:562)
        at com.esotericsoftware.kryo.io.Input.readString(Input.java:436)
        at 
com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer.read(DefaultSerializers.java:157)
        at 
com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer.read(DefaultSerializers.java:146)
        at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:706)
        at 
com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611)
        at 
com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
        at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
        at 
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:338)
        at 
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:293)
        at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:651)
        at 
com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:605)
        at 
com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
        at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:651)
        at 
com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:605)
        at 
com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
        at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
        at 
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:338)
        at 
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:293)
        at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
        at 
org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:173)
        at 
org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79)
        at 
org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:621)
        at 
org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:379)
        at 
org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:82)
        at 
org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:51)
        at 
org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:51)
        at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1618)
        at 
org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:50)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
Exception in thread "task-result-getter-1" 15/05/13 18:00:04 INFO 
TaskSetManager: Starting task 159.0 in stage 48.0 (TID 954, 
sandbox.hortonworks.com, PROCESS_LOCAL, 1473 bytes)
java.lang.OutOfMemoryError: GC overhead limit exceeded
        at 
org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:147)
        at 
org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply(ContextCleaner.scala:144)
        at 
org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply(ContextCleaner.scala:144)
        at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1618)
        at 
org.apache.spark.ContextCleaner.org$apache$spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:143)
        at org.apache.spark.ContextCleaner$$anon$3.run(ContextCleaner.scala:65)
java.lang.OutOfMemoryError: GC overhead limit exceeded
        at java.lang.String.<init>(String.java:315)
        at com.esotericsoftware.kryo.io.Input.readAscii(Input.java:562)
        at com.esotericsoftware.kryo.io.Input.readString(Input.java:436)
        at 
com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer.read(DefaultSerializers.java:157)
        at 
com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer.read(DefaultSerializers.java:146)
        at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:706)
        at 
com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611)
        at 
com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
        at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
        at 
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:338)
        at 
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:293)
        at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:651)
        at 
com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:605)
        at 
com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
        at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:651)
        at 
com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:605)
        at 
com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
        at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
        at 
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:338)
        at 
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:293)
        at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
        at 
org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:173)
        at 
org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79)
        at 
org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:621)
        at 
org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:379)
        at 
org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:82)
        at 
org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:51)
        at 
org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:51)
        at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1618)
        at 
org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:50)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
15/05/13 18:00:05 INFO TaskSetManager: Finished task 154.0 in stage 48.0 (TID 
949) in 40665 ms on sandbox.hortonworks.com (153/200)
15/05/13 18:00:20 ERROR Utils: Uncaught exception in thread task-result-getter-3
java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "task-result-getter-3" java.lang.OutOfMemoryError: GC 
overhead limit exceeded
15/05/13 18:00:28 ERROR Utils: Uncaught exception in thread task-result-getter-2
java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "task-result-getter-2" java.lang.OutOfMemoryError: GC 
overhead limit exceeded
15/05/13 18:00:29 INFO TaskSetManager: Starting task 160.0 in stage 48.0 (TID 
955, sandbox.hortonworks.com, PROCESS_LOCAL, 1473 bytes)
15/05/13 18:00:31 ERROR ActorSystemImpl: exception on LARS’ timer thread
java.lang.OutOfMemoryError: GC overhead limit exceeded
        at 
akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:409)
        at 
akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:375)
        at java.lang.Thread.run(Thread.java:744)
15/05/13 18:00:31 INFO ActorSystemImpl: starting new LARS thread
15/05/13 18:00:31 ERROR ActorSystemImpl: Uncaught fatal error from thread 
[sparkDriver-akka.remote.default-remote-dispatcher-6] shutting down ActorSystem 
[sparkDriver]
java.lang.OutOfMemoryError: GC overhead limit exceeded
        at java.lang.Class.getDeclaredMethods0(Native Method)
        at java.lang.Class.privateGetDeclaredMethods(Class.java:2531)
        at java.lang.Class.getDeclaredMethod(Class.java:2002)
        at 
java.io.ObjectStreamClass.getPrivateMethod(ObjectStreamClass.java:1431)
        at java.io.ObjectStreamClass.access$1700(ObjectStreamClass.java:72)
        at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:494)
        at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:468)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.io.ObjectStreamClass.<init>(ObjectStreamClass.java:468)
        at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:365)
        at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:602)
        at 
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622)
        at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
        at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
        at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
        at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
        at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
        at 
akka.serialization.JavaSerializer$$anonfun$1.apply(Serializer.scala:136)
        at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
        at akka.serialization.JavaSerializer.fromBinary(Serializer.scala:136)
        at 
akka.serialization.Serialization$$anonfun$deserialize$1.apply(Serialization.scala:104)
15/05/13 18:00:31 ERROR ActorSystemImpl: Uncaught fatal error from thread 
[sparkDriver-scheduler-1] shutting down ActorSystem [sparkDriver]
java.lang.OutOfMemoryError: GC overhead limit exceeded
        at 
akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:409)
        at 
akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:375)
        at java.lang.Thread.run(Thread.java:744)
15/05/13 18:00:31 ERROR ActorSystemImpl: Uncaught fatal error from thread 
[sparkDriver-akka.remote.default-remote-dispatcher-5] shutting down ActorSystem 
[sparkDriver]
java.lang.OutOfMemoryError: GC overhead limit exceeded
        at java.lang.Class.getDeclaredMethods0(Native Method)
        at java.lang.Class.privateGetDeclaredMethods(Class.java:2531)
        at java.lang.Class.getDeclaredMethod(Class.java:2002)
        at 
java.io.ObjectStreamClass.getPrivateMethod(ObjectStreamClass.java:1431)
        at java.io.ObjectStreamClass.access$1700(ObjectStreamClass.java:72)
        at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:494)
        at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:468)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.io.ObjectStreamClass.<init>(ObjectStreamClass.java:468)
        at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:365)
        at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:602)
        at 
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622)
        at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
        at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
        at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
        at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
        at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
        at 
akka.serialization.JavaSerializer$$anonfun$1.apply(Serializer.scala:136)
        at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)




Feel free to contact me - I will send you full logs. 

and in the same time tons of logs of the thrift server. 






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to