[ 
https://issues.apache.org/jira/browse/CASSANDRA-11723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15274535#comment-15274535
 ] 

Stefano Ortolani commented on CASSANDRA-11723:
----------------------------------------------

Reproduced and managed to catch some output

{code:xml}
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007fed3c5edbf0, pid=29072, tid=140528026334976
#
# JRE version: Java(TM) SE Runtime Environment (8.0_60-b27) (build 1.8.0_60-b27)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.60-b23 mixed mode linux-amd64 
compressed oops)
# Problematic frame:
# C  [libjemalloc.so.1+0x8bf0]
[error occurred during error reporting (printing problematic frame), id 0xb]

# Failed to write core dump. Core dumps have been disabled. To enable core 
dumping, try "ulimit -c unlimited" before starting Java again
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
#

---------------  T H R E A D  ---------------

Current thread (0x00007fed11282000):  JavaThread "SharedPool-Worker-93" daemon 
[_thread_new, id=29652, stack(0x0000000000000000,0x0000000000000000)]

siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 
0x00007fed948855a8

Registers:
RAX=0x00000000106bccb7, RBX=0x00007fed1129e000, RCX=0x00007fed1129eff0, 
RDX=0x0000000000000002
RSP=0x00007fcf3b172b50, RBP=0xffffffffffffffe8, RSI=0x00007fed1129e040, 
RDI=0x0000000000000000
R8 =0x00007fed3b4009c0, R9 =0x0000000000000000, R10=0x00007fcf3b172218, 
R11=0x0000000000000001
R12=0x0000000000000020, R13=0x0000000000000000, R14=0x0000000000000000, 
R15=0x0000000000000003
RIP=0x00007fed3c5edbf0, EFLAGS=0x0000000000010202, CSGSFS=0x0000000000000033, 
ERR=0x0000000000000004
  TRAPNO=0x000000000000000e

Top of Stack: (sp=0x00007fcf3b172b50)
0x00007fcf3b172b50:   0000000000000000 0000000000000000
0x00007fcf3b172b60:   0000000000000000 0000000000000020
0x00007fcf3b172b70:   0000000000000020 00007fcf3b172d30
0x00007fcf3b172b80:   0000000000000000 00007fed3c5e8da5
0x00007fcf3b172b90:   0000000000000000 00007fcf3b173700
0x00007fcf3b172ba0:   00007fcf3b172d30 00007fed3c3d1afa
0x00007fcf3b172bb0:   0000000000000000 0000000000000000
0x00007fcf3b172bc0:   0000000000000000 0000000000000000
0x00007fcf3b172bd0:   0000000000000000 0000000000000000
0x00007fcf3b172be0:   0000000000000000 0000000000000000
0x00007fcf3b172bf0:   00007fcf3b172da8 00007fcf3b172d90
0x00007fcf3b172c00:   00007fcf3b172da0 00007fcf3b172d30
0x00007fcf3b172c10:   00007fed11282000 00007fed3a546c94
0x00007fcf3b172c20:   0000000000000000 0000000000000000
0x00007fcf3b172c30:   0000000000000000 0000000000000000
0x00007fcf3b172c40:   0000000000000000 0000000000000000
0x00007fcf3b172c50:   0000000000000000 0000000000000000
0x00007fcf3b172c60:   0000000000000000 0000000000000000
0x00007fcf3b172c70:   0000000000000000 0000000000000000
0x00007fcf3b172c80:   0000000000000000 0000000000000000
0x00007fcf3b172c90:   0000000000000000 0000000000000000
0x00007fcf3b172ca0:   0000000000000000 0000000000000000
0x00007fcf3b172cb0:   0000000000000000 0000000000000000
0x00007fcf3b172cc0:   0000000000000000 0000000000000000
0x00007fcf3b172cd0:   0000000000000000 0000000000000000
0x00007fcf3b172ce0:   0000000000000000 0000000000000000
0x00007fcf3b172cf0:   0000000000000000 00007fed11294728
0x00007fcf3b172d00:   0000000000000000 00007fed3c3d3104
0x00007fcf3b172d10:   0000000000000000 0000000000000000
0x00007fcf3b172d20:   0000000000000000 0000000000000000
0x00007fcf3b172d30:   0000000000000000 0000000000000009
0x00007fcf3b172d40:   0000000000000000 00007fcf3b174000 

Instructions: (pc=0x00007fed3c5edbf0)
0x00007fed3c5edbd0:   08 00 00 00 00 e9 c4 fe ff ff 66 0f 1f 44 00 00
0x00007fed3c5edbe0:   83 e8 01 3b 06 89 46 08 7d 02 89 06 48 8b 4e 10
0x00007fed3c5edbf0:   48 8b 2c c1 48 85 ed 0f 85 13 ff ff ff e9 f5 fe
0x00007fed3c5edc00:   ff ff 66 0f 1f 44 00 00 e8 a3 ae 00 00 0f 1f 00 

Register to memory mapping:

RAX=0x00000000106bccb7 is an unknown value
RBX=0x00007fed1129e000 is an unknown value
RCX=0x00007fed1129eff0 is an unknown value
RDX=0x0000000000000002 is an unknown value
RSP=0x00007fcf3b172b50 is an unknown value
RBP=0xffffffffffffffe8 is an unknown value
RSI=0x00007fed1129e040 is an unknown value
RDI=0x0000000000000000 is an unknown value
R8 =0x00007fed3b4009c0 is an unknown value
R9 =0x0000000000000000 is an unknown value
R10=0x00007fcf3b172218 is an unknown value
R11=0x0000000000000001 is an unknown value
R12=0x0000000000000020 is an unknown value
R13=0x0000000000000000 is an unknown value
R14=0x0000000000000000 is an unknown value
R15=0x0000000000000003 is an unknown value


Stack: [0x0000000000000000,0x0000000000000000],  sp=0x00007fcf3b172b50,  free 
space=137234400714k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [libjemalloc.so.1+0x8bf0]
[error occurred during error reporting (printing native stack), id 0xb]
{code}

> Cassandra upgrade from 2.1.11 to 3.0.5 leads to unstable nodes
> --------------------------------------------------------------
>
>                 Key: CASSANDRA-11723
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11723
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Stefano Ortolani
>
> Upgrade seems fine, but any restart of the node might lead to a situation 
> where the node just dies after 30 seconds / 1 minute. 
> Nothing in the logs besides many "FailureDetector.java:456 - Ignoring 
> interval time of 3000892567 for /10.12.a.x" output every second (against all 
> other nodes) in debug.log plus some spurious GraphiteErrors/ReadRepair 
> notifications:
> {code:xml}
> DEBUG [GossipStage:1] 2016-05-05 22:29:03,921 FailureDetector.java:456 - 
> Ignoring interval time of 2373187360 for /10.12.a.x
> DEBUG [GossipStage:1] 2016-05-05 22:29:03,921 FailureDetector.java:456 - 
> Ignoring interval time of 2000276196 for /10.12.a.y
> DEBUG [ReadRepairStage:24] 2016-05-05 22:29:03,990 ReadCallback.java:234 - 
> Digest mismatch:
> org.apache.cassandra.service.DigestMismatchException: Mismatch for key 
> DecoratedKey(-152946356843306763, e859fdd2f264485f42030ce261e4e12e) 
> (d6e617ece3b7bec6138b52b8974b8cab vs 31becca666a62b3c4b2fc0bab9902718)
>       at 
> org.apache.cassandra.service.DigestResolver.resolve(DigestResolver.java:85) 
> ~[apache-cassandra-3.0.5.jar:3.0.5]
>       at 
> org.apache.cassandra.service.ReadCallback$AsyncRepairRunner.run(ReadCallback.java:225)
>  ~[apache-cassandra-3.0.5.jar:3.0.5]
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_60]
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_60]
>       at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]
> DEBUG [GossipStage:1] 2016-05-05 22:29:04,841 FailureDetector.java:456 - 
> Ignoring interval time of 3000299340 for /10.12.33.5
> ERROR [metrics-graphite-reporter-1-thread-1] 2016-05-05 22:29:05,692 
> ScheduledReporter.java:119 - RuntimeException thrown from 
> GraphiteReporter#report. Exception was suppressed.
> java.lang.IllegalStateException: Unable to compute ceiling for max when 
> histogram overflowed
>       at 
> org.apache.cassandra.utils.EstimatedHistogram.rawMean(EstimatedHistogram.java:231)
>  ~[apache-cassandra-3.0.5.jar:3.0.5]
>       at 
> org.apache.cassandra.metrics.EstimatedHistogramReservoir$HistogramSnapshot.getMean(EstimatedHistogramReservoir.java:103)
>  ~[apache-cassandra-3.0.5.jar:3.0.5]
>       at 
> com.codahale.metrics.graphite.GraphiteReporter.reportHistogram(GraphiteReporter.java:252)
>  ~[metrics-graphite-3.1.0.jar:3.1.0]
>       at 
> com.codahale.metrics.graphite.GraphiteReporter.report(GraphiteReporter.java:166)
>  ~[metrics-graphite-3.1.0.jar:3.1.0]
>       at 
> com.codahale.metrics.ScheduledReporter.report(ScheduledReporter.java:162) 
> ~[metrics-core-3.1.0.jar:3.1.0]
>       at 
> com.codahale.metrics.ScheduledReporter$1.run(ScheduledReporter.java:117) 
> ~[metrics-core-3.1.0.jar:3.1.0]
>       at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_60]
>       at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
> [na:1.8.0_60]
>       at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>  [na:1.8.0_60]
>       at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>  [na:1.8.0_60]
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_60]
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_60]
>       at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]
> {code}
> I know this is not much but nothing else gets to dmesg or to any other log. 
> Any suggestion how to debug this further?
> I upgraded two nodes so far, and it happened on both nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to