[
https://issues.apache.org/jira/browse/CASSANDRA-11723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15274535#comment-15274535
]
Stefano Ortolani commented on CASSANDRA-11723:
----------------------------------------------
Reproduced and managed to catch some output
{code:xml}
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007fed3c5edbf0, pid=29072, tid=140528026334976
#
# JRE version: Java(TM) SE Runtime Environment (8.0_60-b27) (build 1.8.0_60-b27)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.60-b23 mixed mode linux-amd64
compressed oops)
# Problematic frame:
# C [libjemalloc.so.1+0x8bf0]
[error occurred during error reporting (printing problematic frame), id 0xb]
# Failed to write core dump. Core dumps have been disabled. To enable core
dumping, try "ulimit -c unlimited" before starting Java again
#
# If you would like to submit a bug report, please visit:
# http://bugreport.java.com/bugreport/crash.jsp
#
--------------- T H R E A D ---------------
Current thread (0x00007fed11282000): JavaThread "SharedPool-Worker-93" daemon
[_thread_new, id=29652, stack(0x0000000000000000,0x0000000000000000)]
siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr:
0x00007fed948855a8
Registers:
RAX=0x00000000106bccb7, RBX=0x00007fed1129e000, RCX=0x00007fed1129eff0,
RDX=0x0000000000000002
RSP=0x00007fcf3b172b50, RBP=0xffffffffffffffe8, RSI=0x00007fed1129e040,
RDI=0x0000000000000000
R8 =0x00007fed3b4009c0, R9 =0x0000000000000000, R10=0x00007fcf3b172218,
R11=0x0000000000000001
R12=0x0000000000000020, R13=0x0000000000000000, R14=0x0000000000000000,
R15=0x0000000000000003
RIP=0x00007fed3c5edbf0, EFLAGS=0x0000000000010202, CSGSFS=0x0000000000000033,
ERR=0x0000000000000004
TRAPNO=0x000000000000000e
Top of Stack: (sp=0x00007fcf3b172b50)
0x00007fcf3b172b50: 0000000000000000 0000000000000000
0x00007fcf3b172b60: 0000000000000000 0000000000000020
0x00007fcf3b172b70: 0000000000000020 00007fcf3b172d30
0x00007fcf3b172b80: 0000000000000000 00007fed3c5e8da5
0x00007fcf3b172b90: 0000000000000000 00007fcf3b173700
0x00007fcf3b172ba0: 00007fcf3b172d30 00007fed3c3d1afa
0x00007fcf3b172bb0: 0000000000000000 0000000000000000
0x00007fcf3b172bc0: 0000000000000000 0000000000000000
0x00007fcf3b172bd0: 0000000000000000 0000000000000000
0x00007fcf3b172be0: 0000000000000000 0000000000000000
0x00007fcf3b172bf0: 00007fcf3b172da8 00007fcf3b172d90
0x00007fcf3b172c00: 00007fcf3b172da0 00007fcf3b172d30
0x00007fcf3b172c10: 00007fed11282000 00007fed3a546c94
0x00007fcf3b172c20: 0000000000000000 0000000000000000
0x00007fcf3b172c30: 0000000000000000 0000000000000000
0x00007fcf3b172c40: 0000000000000000 0000000000000000
0x00007fcf3b172c50: 0000000000000000 0000000000000000
0x00007fcf3b172c60: 0000000000000000 0000000000000000
0x00007fcf3b172c70: 0000000000000000 0000000000000000
0x00007fcf3b172c80: 0000000000000000 0000000000000000
0x00007fcf3b172c90: 0000000000000000 0000000000000000
0x00007fcf3b172ca0: 0000000000000000 0000000000000000
0x00007fcf3b172cb0: 0000000000000000 0000000000000000
0x00007fcf3b172cc0: 0000000000000000 0000000000000000
0x00007fcf3b172cd0: 0000000000000000 0000000000000000
0x00007fcf3b172ce0: 0000000000000000 0000000000000000
0x00007fcf3b172cf0: 0000000000000000 00007fed11294728
0x00007fcf3b172d00: 0000000000000000 00007fed3c3d3104
0x00007fcf3b172d10: 0000000000000000 0000000000000000
0x00007fcf3b172d20: 0000000000000000 0000000000000000
0x00007fcf3b172d30: 0000000000000000 0000000000000009
0x00007fcf3b172d40: 0000000000000000 00007fcf3b174000
Instructions: (pc=0x00007fed3c5edbf0)
0x00007fed3c5edbd0: 08 00 00 00 00 e9 c4 fe ff ff 66 0f 1f 44 00 00
0x00007fed3c5edbe0: 83 e8 01 3b 06 89 46 08 7d 02 89 06 48 8b 4e 10
0x00007fed3c5edbf0: 48 8b 2c c1 48 85 ed 0f 85 13 ff ff ff e9 f5 fe
0x00007fed3c5edc00: ff ff 66 0f 1f 44 00 00 e8 a3 ae 00 00 0f 1f 00
Register to memory mapping:
RAX=0x00000000106bccb7 is an unknown value
RBX=0x00007fed1129e000 is an unknown value
RCX=0x00007fed1129eff0 is an unknown value
RDX=0x0000000000000002 is an unknown value
RSP=0x00007fcf3b172b50 is an unknown value
RBP=0xffffffffffffffe8 is an unknown value
RSI=0x00007fed1129e040 is an unknown value
RDI=0x0000000000000000 is an unknown value
R8 =0x00007fed3b4009c0 is an unknown value
R9 =0x0000000000000000 is an unknown value
R10=0x00007fcf3b172218 is an unknown value
R11=0x0000000000000001 is an unknown value
R12=0x0000000000000020 is an unknown value
R13=0x0000000000000000 is an unknown value
R14=0x0000000000000000 is an unknown value
R15=0x0000000000000003 is an unknown value
Stack: [0x0000000000000000,0x0000000000000000], sp=0x00007fcf3b172b50, free
space=137234400714k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C [libjemalloc.so.1+0x8bf0]
[error occurred during error reporting (printing native stack), id 0xb]
{code}
> Cassandra upgrade from 2.1.11 to 3.0.5 leads to unstable nodes
> --------------------------------------------------------------
>
> Key: CASSANDRA-11723
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11723
> Project: Cassandra
> Issue Type: Bug
> Reporter: Stefano Ortolani
>
> Upgrade seems fine, but any restart of the node might lead to a situation
> where the node just dies after 30 seconds / 1 minute.
> Nothing in the logs besides many "FailureDetector.java:456 - Ignoring
> interval time of 3000892567 for /10.12.a.x" output every second (against all
> other nodes) in debug.log plus some spurious GraphiteErrors/ReadRepair
> notifications:
> {code:xml}
> DEBUG [GossipStage:1] 2016-05-05 22:29:03,921 FailureDetector.java:456 -
> Ignoring interval time of 2373187360 for /10.12.a.x
> DEBUG [GossipStage:1] 2016-05-05 22:29:03,921 FailureDetector.java:456 -
> Ignoring interval time of 2000276196 for /10.12.a.y
> DEBUG [ReadRepairStage:24] 2016-05-05 22:29:03,990 ReadCallback.java:234 -
> Digest mismatch:
> org.apache.cassandra.service.DigestMismatchException: Mismatch for key
> DecoratedKey(-152946356843306763, e859fdd2f264485f42030ce261e4e12e)
> (d6e617ece3b7bec6138b52b8974b8cab vs 31becca666a62b3c4b2fc0bab9902718)
> at
> org.apache.cassandra.service.DigestResolver.resolve(DigestResolver.java:85)
> ~[apache-cassandra-3.0.5.jar:3.0.5]
> at
> org.apache.cassandra.service.ReadCallback$AsyncRepairRunner.run(ReadCallback.java:225)
> ~[apache-cassandra-3.0.5.jar:3.0.5]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> [na:1.8.0_60]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> [na:1.8.0_60]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]
> DEBUG [GossipStage:1] 2016-05-05 22:29:04,841 FailureDetector.java:456 -
> Ignoring interval time of 3000299340 for /10.12.33.5
> ERROR [metrics-graphite-reporter-1-thread-1] 2016-05-05 22:29:05,692
> ScheduledReporter.java:119 - RuntimeException thrown from
> GraphiteReporter#report. Exception was suppressed.
> java.lang.IllegalStateException: Unable to compute ceiling for max when
> histogram overflowed
> at
> org.apache.cassandra.utils.EstimatedHistogram.rawMean(EstimatedHistogram.java:231)
> ~[apache-cassandra-3.0.5.jar:3.0.5]
> at
> org.apache.cassandra.metrics.EstimatedHistogramReservoir$HistogramSnapshot.getMean(EstimatedHistogramReservoir.java:103)
> ~[apache-cassandra-3.0.5.jar:3.0.5]
> at
> com.codahale.metrics.graphite.GraphiteReporter.reportHistogram(GraphiteReporter.java:252)
> ~[metrics-graphite-3.1.0.jar:3.1.0]
> at
> com.codahale.metrics.graphite.GraphiteReporter.report(GraphiteReporter.java:166)
> ~[metrics-graphite-3.1.0.jar:3.1.0]
> at
> com.codahale.metrics.ScheduledReporter.report(ScheduledReporter.java:162)
> ~[metrics-core-3.1.0.jar:3.1.0]
> at
> com.codahale.metrics.ScheduledReporter$1.run(ScheduledReporter.java:117)
> ~[metrics-core-3.1.0.jar:3.1.0]
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> [na:1.8.0_60]
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> [na:1.8.0_60]
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> [na:1.8.0_60]
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> [na:1.8.0_60]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> [na:1.8.0_60]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> [na:1.8.0_60]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]
> {code}
> I know this is not much but nothing else gets to dmesg or to any other log.
> Any suggestion how to debug this further?
> I upgraded two nodes so far, and it happened on both nodes.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)