[
https://issues.apache.org/jira/browse/SPARK-15822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15331370#comment-15331370
]
Pete Robbins commented on SPARK-15822:
--------------------------------------
Chatting with [~hvanhovell] here is the current state. I can reproduce a segv
using local[8] on an 8 core machine. It is intermittent but many many runs
with eg local[2] produce no issues. The segv info is:
{noformat}
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007fe8c118ca58, pid=3558, tid=140633451779840
#
# JRE version: OpenJDK Runtime Environment (8.0_91-b14) (build 1.8.0_91-b14)
# Java VM: OpenJDK 64-Bit Server VM (25.91-b14 mixed mode linux-amd64
compressed oops)
# Problematic frame:
# J 7467 C1 org.apache.spark.unsafe.Platform.getByte(Ljava/lang/Object;J)B (9
bytes) @ 0x00007fe8c118ca58 [0x00007fe8c118ca20+0x38]
#
# Failed to write core dump. Core dumps have been disabled. To enable core
dumping, try "ulimit -c unlimited" before starting Java again
#
# If you would like to submit a bug report, please visit:
# http://bugreport.java.com/bugreport/crash.jsp
#
--------------- T H R E A D ---------------
Current thread (0x00007fe858018800): JavaThread "Executor task launch
worker-3" daemon [_thread_in_Java, id=3698,
stack(0x00007fe7c6dfd000,0x00007fe7c6efe000)]
siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr:
0x0000000000a09cf4
Registers:
RAX=0x00007fe884ce5828, RBX=0x00007fe884ce5828, RCX=0x00007fe81e0a5360,
RDX=0x0000000000a09cf4
RSP=0x00007fe7c6efb9e0, RBP=0x00007fe7c6efba80, RSI=0x0000000000000000,
RDI=0x0000000000003848
R8 =0x00000000200b94c8, R9 =0x00000000eef66bf0, R10=0x00007fe8d87a2f00,
R11=0x00007fe8c118ca20
R12=0x0000000000000000, R13=0x00007fe7c6efba28, R14=0x00007fe7c6efba98,
R15=0x00007fe858018800
RIP=0x00007fe8c118ca58, EFLAGS=0x0000000000010206, CSGSFS=0x0000000000000033,
ERR=0x0000000000000004
TRAPNO=0x000000000000000e
Top of Stack: (sp=0x00007fe7c6efb9e0)
0x00007fe7c6efb9e0: 00007fe7c56941e8 0000000000000000
0x00007fe7c6efb9f0: 00007fe7c6efbab0 00007fe8c140c38c
0x00007fe7c6efba00: 00007fe8c1007d80 00000000eef66bc8
0x00007fe7c6efba10: 00007fe7c6efba80 00007fe8c1007700
0x00007fe7c6efba20: 00007fe8c1007700 0000000000a09cf4
0x00007fe7c6efba30: 0000000000000030 0000000000000000
0x00007fe7c6efba40: 00007fe7c6efba40 00007fe81e0a1f9b
0x00007fe7c6efba50: 00007fe7c6efba98 00007fe81e0a5360
0x00007fe7c6efba60: 0000000000000000 00007fe81e0a1fc0
0x00007fe7c6efba70: 00007fe7c6efba28 00007fe7c6efba90
0x00007fe7c6efba80: 00007fe7c6efbae8 00007fe8c1007700
0x00007fe7c6efba90: 0000000000000000 00000000ee4f4898
0x00007fe7c6efbaa0: 000000000000004d 00007fe7c6efbaa8
0x00007fe7c6efbab0: 00007fe81e0a42be 00007fe7c6efbb18
0x00007fe7c6efbac0: 00007fe81e0a5360 0000000000000000
0x00007fe7c6efbad0: 00007fe81e0a4338 00007fe7c6efba90
0x00007fe7c6efbae0: 00007fe7c6efbb10 00007fe7c6efbb60
0x00007fe7c6efbaf0: 00007fe8c1007a40 0000000000000000
0x00007fe7c6efbb00: 0000000000000000 0000000000000003
0x00007fe7c6efbb10: 00000000ee4f4898 00000000eef67950
0x00007fe7c6efbb20: 00007fe7c6efbb20 00007fe81e0a43f2
0x00007fe7c6efbb30: 00007fe7c6efbb78 00007fe81e0a5360
0x00007fe7c6efbb40: 0000000000000000 00007fe81e0a4418
0x00007fe7c6efbb50: 00007fe7c6efbb10 00007fe7c6efbb70
0x00007fe7c6efbb60: 00007fe7c6efbbc0 00007fe8c1007a40
0x00007fe7c6efbb70: 00000000ee4f4898 00000000eef67950
0x00007fe7c6efbb80: 00007fe7c6efbb80 00007fe7c56844e5
0x00007fe7c6efbb90: 00007fe7c6efbc28 00007fe7c5684950
0x00007fe7c6efbba0: 0000000000000000 00007fe7c5684618
0x00007fe7c6efbbb0: 00007fe7c6efbb70 00007fe7c6efbc18
0x00007fe7c6efbbc0: 00007fe7c6efbc70 00007fe8c10077d0
0x00007fe7c6efbbd0: 0000000000000000 0000000000000000
Instructions: (pc=0x00007fe8c118ca58)
0x00007fe8c118ca38: 08 83 c7 08 89 78 08 48 b8 28 58 ce 84 e8 7f 00
0x00007fe8c118ca48: 00 81 e7 f8 3f 00 00 83 ff 00 0f 84 16 00 00 00
0x00007fe8c118ca58: 0f be 04 16 c1 e0 18 c1 f8 18 48 83 c4 30 5d 85
0x00007fe8c118ca68: 05 93 c6 85 17 c3 48 89 44 24 08 48 c7 04 24 ff
Register to memory mapping:
RAX={method} {0x00007fe884ce5828} 'getByte' '(Ljava/lang/Object;J)B' in
'org/apache/spark/unsafe/Platform'
RBX={method} {0x00007fe884ce5828} 'getByte' '(Ljava/lang/Object;J)B' in
'org/apache/spark/unsafe/Platform'
RCX=0x00007fe81e0a5360 is pointing into metadata
RDX=0x0000000000a09cf4 is an unknown value
RSP=0x00007fe7c6efb9e0 is pointing into the stack for thread: 0x00007fe858018800
RBP=0x00007fe7c6efba80 is pointing into the stack for thread: 0x00007fe858018800
RSI=0x0000000000000000 is an unknown value
RDI=0x0000000000003848 is an unknown value
R8 =0x00000000200b94c8 is an unknown value
R9 =0x00000000eef66bf0 is an oop
[B
- klass: {type array byte}
- length: 48
R10=0x00007fe8d87a2f00: <offset 0xf07f00> in
/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.91-0.b14.el6_7.x86_64/jre/lib/amd64/server/libjvm.so
at 0x00007fe8d789b000
R11=0x00007fe8c118ca20 is at entry_point+0 in (nmethod*)0x00007fe8c118c8d0
R12=0x0000000000000000 is an unknown value
R13=0x00007fe7c6efba28 is pointing into the stack for thread: 0x00007fe858018800
R14=0x00007fe7c6efba98 is pointing into the stack for thread: 0x00007fe858018800
R15=0x00007fe858018800 is a thread
Stack: [0x00007fe7c6dfd000,0x00007fe7c6efe000], sp=0x00007fe7c6efb9e0, free
space=1018k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
J 7467 C1 org.apache.spark.unsafe.Platform.getByte(Ljava/lang/Object;J)B (9
bytes) @ 0x00007fe8c118ca58 [0x00007fe8c118ca20+0x38]
j org.apache.spark.unsafe.types.UTF8String.getByte(I)B+11
j
org.apache.spark.unsafe.types.UTF8String.compareTo(Lorg/apache/spark/unsafe/types/UTF8String;)I+30
j
org.apache.spark.unsafe.types.UTF8String.compare(Lorg/apache/spark/unsafe/types/UTF8String;)I+2
j
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.findNextInnerJoinRows$(Lorg/apache/spark/sql/catalyst/expressions/GeneratedClass$GeneratedIterator;Lscala/collection/Iterator;Lscala/collection/Iterator;)Z+141
j
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext()V+410
J 7729 C1 org.apache.spark.sql.execution.BufferedRowIterator.hasNext()Z (30
bytes) @ 0x00007fe8c1ad80d4 [0x00007fe8c1ad7e60+0x274]
j
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$doExecute$3$$anon$2.hasNext()Z+4
J 8582 C2 scala.collection.Iterator$$anon$11.hasNext()Z (10 bytes) @
0x00007fe8c2506bd8 [0x00007fe8c2506760+0x478]
j scala.collection.convert.Wrappers$IteratorWrapper.hasNext()Z+4
j
org.spark_project.guava.collect.Ordering.leastOf(Ljava/util/Iterator;I)Ljava/util/List;+132
j
org.apache.spark.util.collection.Utils$.takeOrdered(Lscala/collection/Iterator;ILscala/math/Ordering;)Lscala/collection/Iterator;+29
j
org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(Lscala/collection/Iterator;)Lscala/collection/Iterator;+46
j
org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(Ljava/lang/Object;)Ljava/lang/Object;+5
j
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(Lorg/apache/spark/TaskContext;ILscala/collection/Iterator;)Lscala/collection/Iterator;+5
j
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;+13
j
org.apache.spark.rdd.MapPartitionsRDD.compute(Lorg/apache/spark/Partition;Lorg/apache/spark/TaskContext;)Lscala/collection/Iterator;+27
j
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(Lorg/apache/spark/Partition;Lorg/apache/spark/TaskContext;)Lscala/collection/Iterator;+26
j
org.apache.spark.rdd.RDD.iterator(Lorg/apache/spark/Partition;Lorg/apache/spark/TaskContext;)Lscala/collection/Iterator;+33
j
org.apache.spark.scheduler.ResultTask.runTask(Lorg/apache/spark/TaskContext;)Ljava/lang/Object;+136
j
org.apache.spark.scheduler.Task.run(JILorg/apache/spark/metrics/MetricsSystem;)Ljava/lang/Object;+82
j org.apache.spark.executor.Executor$TaskRunner.run()V+374
j
java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V+95
j java.util.concurrent.ThreadPoolExecutor$Worker.run()V+5
j java.lang.Thread.run()V+11
{noformat}
> segmentation violation in o.a.s.unsafe.types.UTF8String
> --------------------------------------------------------
>
> Key: SPARK-15822
> URL: https://issues.apache.org/jira/browse/SPARK-15822
> Project: Spark
> Issue Type: Bug
> Affects Versions: 2.0.0
> Environment: linux amd64
> openjdk version "1.8.0_91"
> OpenJDK Runtime Environment (build 1.8.0_91-b14)
> OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)
> Reporter: Pete Robbins
> Assignee: Herman van Hovell
> Priority: Blocker
>
> Executors fail with segmentation violation while running application with
> spark.memory.offHeap.enabled true
> spark.memory.offHeap.size 512m
> Also now reproduced with
> spark.memory.offHeap.enabled false
> {noformat}
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> # SIGSEGV (0xb) at pc=0x00007f4559b4d4bd, pid=14182, tid=139935319750400
> #
> # JRE version: OpenJDK Runtime Environment (8.0_91-b14) (build 1.8.0_91-b14)
> # Java VM: OpenJDK 64-Bit Server VM (25.91-b14 mixed mode linux-amd64
> compressed oops)
> # Problematic frame:
> # J 4816 C2
> org.apache.spark.unsafe.types.UTF8String.compareTo(Lorg/apache/spark/unsafe/types/UTF8String;)I
> (64 bytes) @ 0x00007f4559b4d4bd [0x00007f4559b4d460+0x5d]
> {noformat}
> We initially saw this on IBM java on PowerPC box but is recreatable on linux
> with OpenJDK. On linux with IBM Java 8 we see a null pointer exception at the
> same code point:
> {noformat}
> 16/06/08 11:14:58 ERROR Executor: Exception in task 1.0 in stage 5.0 (TID 48)
> java.lang.NullPointerException
> at
> org.apache.spark.unsafe.types.UTF8String.compareTo(UTF8String.java:831)
> at org.apache.spark.unsafe.types.UTF8String.compare(UTF8String.java:844)
> at
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.findNextInnerJoinRows$(Unknown
> Source)
> at
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
> Source)
> at
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
> at
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$doExecute$2$$anon$2.hasNext(WholeStageCodegenExec.scala:377)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at
> scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:30)
> at org.spark_project.guava.collect.Ordering.leastOf(Ordering.java:664)
> at org.apache.spark.util.collection.Utils$.takeOrdered(Utils.scala:37)
> at
> org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1365)
> at
> org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1362)
> at
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757)
> at
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757)
> at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:282)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
> at org.apache.spark.scheduler.Task.run(Task.scala:85)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1153)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at java.lang.Thread.run(Thread.java:785)
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]