Not really related, but know that on 12.04 I had to disable jemalloc, otherwise nodes would randomly die at startup ( https://issues.apache.org/jira/browse/CASSANDRA-11723)
Regards, Stefano On Thu, Aug 11, 2016 at 10:28 AM, Riccardo Ferrari <ferra...@gmail.com> wrote: > Hi C* users, > > In recent time I had couple of my nodes crashing (on different dates). I > don't have core dumps however my JVM crash logs goes like this: > =========================================== > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x00007f8f608c8e40, pid=6916, tid=140253195458304 > # > # JRE version: Java(TM) SE Runtime Environment (8.0_60-b27) (build > 1.8.0_60-b27) > # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.60-b23 mixed mode > linux-amd64 compressed oops) > # Problematic frame: > # C [liblz4-java6471621810388748482.so+0x5e40] LZ4_decompress_fast+0xa0 > # > # Failed to write core dump. Core dumps have been disabled. To enable core > dumping, try "ulimit -c unlimited" before starting Java again > # > ... > --------------- T H R E A D --------------- > > > Current thread (0x00007f8f5c7b2d50): JavaThread > "CompactionExecutor:11952" daemon [_thread_in_native, id=16219, > stack(0x00007f8f3de0d000,0x00007f8f3de4e000)] > ... > Stack: [0x00007f8f3de0d000,0x00007f8f3de4e000], sp=0x00007f8f3de4c0e0, > free space=252k > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native > code) > C [liblz4-java6471621810388748482.so+0x5e40] LZ4_decompress_fast+0xa0 > > Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) > J 4150 net.jpountz.lz4.LZ4JNI.LZ4_decompress_fast([BLjava/nio/ > ByteBuffer;I[BLjava/nio/ByteBuffer;II)I (0 bytes) @ 0x00007f8f791e4723 > [0x00007f8f791e4680+0xa3] > J 19836 C2 > org.apache.cassandra.io.compress.CompressedRandomAccessReader.reBufferMmap()V > (354 bytes) @ 0x00007f8f7b714930 [0x00007f8f7b714320+0x610] > J 6662 C2 org.apache.cassandra.db.columniterator. > AbstractSSTableIterator.<init>(Lorg/apache/cassandra/io/ > sstable/format/SSTableReader;Lorg/apache/cassandra/io/util/ > FileDataInput;Lorg/apache/cassandra/db/DecoratedKey; > Lorg/apache/cassandra/db/RowIndexEntry;Lorg/apache > /cassandra/db/filter/ColumnFilter;Z)V (389 bytes) @ 0x00007f8f79c1cdb8 > [0x00007f8f79c1c500+0x8b8] > J 22393 C2 org.apache.cassandra.db.SinglePartitionReadCommand. > queryMemtableAndDiskInternal(Lorg/apache/cassandra/db/ > ColumnFamilyStore;Z)Lorg/apache/cassandra/db/rows/UnfilteredRowIterator; > (818 bytes) @ 0x00007f8f7c1d4364 [0x00007f8f7c1d2f40+0x1424] > J 22166 C1 org.apache.cassandra.db.Keyspace.indexPartition(Lorg/ > apache/cassandra/db/DecoratedKey;Lorg/apache/cassandra/db/ > ColumnFamilyStore;Ljava/util/Set;)V (274 bytes) @ 0x00007f8f7beb6304 > [0x00007f8f7beb5420+0xee4] > j org.apache.cassandra.index.SecondaryIndexBuilder.build()V+46 > j org.apache.cassandra.db.compaction.CompactionManager$11.run()V+18 > J 22293 C2 java.util.concurrent.ThreadPoolExecutor.runWorker( > Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V (225 bytes) @ > 0x00007f8f7b17727c [0x00007f8f7b176da0+0x4dc] > J 21302 C2 java.lang.Thread.run()V (17 bytes) @ 0x00007f8f79fe59f8 > [0x00007f8f79fe59a0+0x58] > v ~StubRoutines::call_stub > ... > VM state:not at safepoint (normal execution) > > VM Mutex/Monitor currently owned by a thread: None > > Heap: > par new generation total 368640K, used 123009K [0x00000006d5e00000, > 0x00000006eee00000, 0x00000006eee00000) > eden space 327680K, 34% used [0x00000006d5e00000, 0x00000006dcaf35c8, > 0x00000006e9e00000) > from space 40960K, 27% used [0x00000006e9e00000, 0x00000006ea92cf00, > 0x00000006ec600000) > to space 40960K, 0% used [0x00000006ec600000, 0x00000006ec600000, > 0x00000006eee00000) > concurrent mark-sweep generation total 3426304K, used 1288977K > [0x00000006eee00000, 0x00000007c0000000, 0x00000007c0000000) > Metaspace used 41685K, capacity 42832K, committed 43156K, reserved > 1087488K > class space used 4455K, capacity 4702K, committed 4756K, reserved > 1048576K > ... > OS:DISTRIB_ID=Ubuntu > DISTRIB_RELEASE=12.04 > DISTRIB_CODENAME=precise > DISTRIB_DESCRIPTION="Ubuntu 12.04.1 LTS" > > uname:Linux 3.2.0-35-virtual #55-Ubuntu SMP Wed Dec 5 18:02:05 UTC 2012 > x86_64 > libc:glibc 2.15 NPTL 2.15 > rlimit: STACK 8192k, CORE 0k, NPROC 119708, NOFILE 100000, AS infinity > load average:2.96 1.08 0.60 > > What am I missing? > Both crashes seems to happen during compaction and when running native > code (LZ4). > Both crashes happens when the nodes are doing scheduled repair (so under > increased load). > Machines are 4vCPUs and 15GB ram (m1.xlarge) > Any hint? > > Best, >