Hi all,

we are seeing the same kind of long pauses in Cassandra. We tried to switch
CMS to G1 without positive result. The stress test is read heavy, 2
datacenters, 6 nodes, 400reqs/sec on one datacenter. We see spikes in
latency on 99.99 percentil and higher, caused by threads being stopped in
JVM.

The GC in G1 looks like this:

{Heap before GC invocations=4073 (full 1):
garbage-first heap   total 8388608K, used 3602914K [0x00000005f5c00000,
0x00000007f5c00000, 0x00000007f5c00000)
 region size 4096K, 142 young (581632K), 11 survivors (45056K)
compacting perm gen  total 28672K, used 27428K [0x00000007f5c00000,
0x00000007f7800000, 0x0000000800000000)
  the space 28672K,  95% used [0x00000007f5c00000, 0x00000007f76c9108,
0x00000007f76c9200, 0x00000007f7800000)
No shared spaces configured.
2014-02-17T04:44:16.385+0100: 222346.218: [GC pause (G1 Evacuation Pause)
(young)
Desired survivor size 37748736 bytes, new threshold 15 (max 15)
- age   1:   17213632 bytes,   17213632 total
- age   2:   19391208 bytes,   36604840 total
, 0.1664300 secs]
  [Parallel Time: 163.9 ms, GC Workers: 2]
     [GC Worker Start (ms): Min: 222346218.3, Avg: 222346218.3, Max:
222346218.3, Diff: 0.0]
     [Ext Root Scanning (ms): Min: 6.0, Avg: 6.9, Max: 7.7, Diff: 1.7, Sum:
13.7]
     [Update RS (ms): Min: 20.4, Avg: 21.3, Max: 22.1, Diff: 1.7, Sum: 42.6]
        [Processed Buffers: Min: 49, Avg: 60.0, Max: 71, Diff: 22, Sum: 120]
     [Scan RS (ms): Min: 23.2, Avg: 23.2, Max: 23.3, Diff: 0.1, Sum: 46.5]
     [Object Copy (ms): Min: 112.3, Avg: 112.3, Max: 112.4, Diff: 0.1, Sum:
224.6]
     [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.0, Sum: 0.1]
     [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum:
0.1]
     [GC Worker Total (ms): Min: 163.8, Avg: 163.8, Max: 163.8, Diff: 0.0,
Sum: 327.6]
     [GC Worker End (ms): Min: 222346382.1, Avg: 222346382.1, Max:
222346382.1, Diff: 0.0]
  [Code Root Fixup: 0.0 ms]
  [Clear CT: 0.4 ms]
  [Other: 2.1 ms]
     [Choose CSet: 0.0 ms]
     [Ref Proc: 1.1 ms]
     [Ref Enq: 0.0 ms]
     [Free CSet: 0.4 ms]
  [Eden: 524.0M(524.0M)->0.0B(476.0M) Survivors: 44.0M->68.0M Heap:
3518.5M(8192.0M)->3018.5M(8192.0M)]
Heap after GC invocations=4074 (full 1):
garbage-first heap   total 8388608K, used 3090914K [0x00000005f5c00000,
0x00000007f5c00000, 0x00000007f5c00000)
 region size 4096K, 17 young (69632K), 17 survivors (69632K)
compacting perm gen  total 28672K, used 27428K [0x00000007f5c00000,
0x00000007f7800000, 0x0000000800000000)
  the space 28672K,  95% used [0x00000007f5c00000, 0x00000007f76c9108,
0x00000007f76c9200, 0x00000007f7800000)
No shared spaces configured.
}
[Times: user=0.35 sys=0.00, real=27.58 secs]
222346.219: G1IncCollectionPause             [     111          0
   0    ]      [     0     0     0     0 27586    ]  0

And the total thime for which application threads were stopped is 27.58
seconds.

CMS behaves in a similar manner. We thought it would be GC, waiting for
mmaped files being read from disk (the thread cannot reach safepoint during
this operation), but it doesn't explain the huge time.

We'll try jhiccup to see if it provides any additional information. The
test was done on mixed aws/openstack environment, openjdk 1.7.0_45,
cassandra 1.2.11. Upgrading to 2.0.x is no option for us.

regards,

ondrej cernos


On Fri, Feb 14, 2014 at 8:53 PM, Frank Ng <fnt...@gmail.com> wrote:

> Sorry, I have not had a chance to file a JIRA ticket.  We have not been
> able to resolve the issue.  But since Joel mentioned that upgrading to
> Cassandra 2.0.X solved it for them, we may need to upgrade.  We are
> currently on Java 1.7 and Cassandra 1.2.8
>
>
>
> On Thu, Feb 13, 2014 at 12:40 PM, Keith Wright <kwri...@nanigans.com>wrote:
>
>> You're running 2.0.* in production?  May I ask what C* version and OS?
>>  Any hardware details would be appreciated as well.  Thx!
>>
>> From: Joel Samuelsson <samuelsson.j...@gmail.com>
>> Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
>> Date: Thursday, February 13, 2014 at 11:39 AM
>>
>> To: "user@cassandra.apache.org" <user@cassandra.apache.org>
>> Subject: Re: Intermittent long application pauses on nodes
>>
>> We have had similar issues and upgrading C* to 2.0.x and Java to 1.7
>> seems to have helped our issues.
>>
>>
>> 2014-02-13 Keith Wright <kwri...@nanigans.com>:
>>
>>> Frank did you ever file a ticket for this issue or find the root cause?
>>>  I believe we are seeing the same issues when attempting to bootstrap.
>>>
>>> Thanks
>>>
>>> From: Robert Coli <rc...@eventbrite.com>
>>> Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
>>> Date: Monday, February 3, 2014 at 6:10 PM
>>> To: "user@cassandra.apache.org" <user@cassandra.apache.org>
>>> Subject: Re: Intermittent long application pauses on nodes
>>>
>>> On Mon, Feb 3, 2014 at 8:52 AM, Benedict Elliott Smith <
>>> belliottsm...@datastax.com> wrote:
>>>
>>>>
>>>> It's possible that this is a JVM issue, but if so there may be some
>>>> remedial action we can take anyway. There are some more flags we should
>>>> add, but we can discuss that once you open a ticket. If you could include
>>>> the strange JMX error as well, that might be helpful.
>>>>
>>>
>>> It would be appreciated if you could inform this thread of the JIRA
>>> ticket number, for the benefit of the community and google searchers. :)
>>>
>>> =Rob
>>>
>>
>>
>

Reply via email to