[jira] [Commented] (CASSANDRA-10730) periodic timeout errors in dtest

Ariel Weisberg (JIRA) Wed, 02 Dec 2015 11:16:10 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-10730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15036409#comment-15036409
 ]


Ariel Weisberg commented on CASSANDRA-10730:
--------------------------------------------

I am not so concerned at this point about the maximum heap size. The free 
number for the old generation looks odd doesn't it? I wonder if we are looking 
at a corrupt JVM? We could also try switching to the parallel collector and see 
if that produces a different/no/better error.

Here is the output for my local eclipse instance.
{code}
Ariels-MBP:java aweisberg$ jmap -heap 250
Attaching to process ID 250, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 25.60-b23

using thread-local object allocation.
Parallel GC with 8 thread(s)

Heap Configuration:
   MinHeapFreeRatio         = 0
   MaxHeapFreeRatio         = 100
   MaxHeapSize              = 1073741824 (1024.0MB)
   NewSize                  = 89128960 (85.0MB)
   MaxNewSize               = 357564416 (341.0MB)
   OldSize                  = 179306496 (171.0MB)
   NewRatio                 = 2
   SurvivorRatio            = 8
   MetaspaceSize            = 21807104 (20.796875MB)
   CompressedClassSpaceSize = 1073741824 (1024.0MB)
   MaxMetaspaceSize         = 17592186044415 MB
   G1HeapRegionSize         = 0 (0.0MB)

Heap Usage:
PS Young Generation
Eden Space:
   capacity = 225443840 (215.0MB)
   used     = 3482008 (3.3207015991210938MB)
   free     = 221961832 (211.6792984008789MB)
   1.5445123716842297% used
>From Space:
   capacity = 11534336 (11.0MB)
   used     = 0 (0.0MB)
   free     = 11534336 (11.0MB)
   0.0% used
To Space:
   capacity = 12582912 (12.0MB)
   used     = 0 (0.0MB)
   free     = 12582912 (12.0MB)
   0.0% used
PS Old Generation
   capacity = 613941248 (585.5MB)
   used     = 168407608 (160.60601043701172MB)
   free     = 445533640 (424.8939895629883MB)
   27.430573943127534% used

42936 interned Strings occupying 4320240 bytes.
{code}

Here is the output after I switched to CMS
{code}
Ariels-MBP:Eclipse aweisberg$ jmap -heap 7220
Attaching to process ID 7220, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 25.60-b23

using parallel threads in the new generation.
using thread-local object allocation.
Concurrent Mark-Sweep GC

Heap Configuration:
   MinHeapFreeRatio         = 40
   MaxHeapFreeRatio         = 70
   MaxHeapSize              = 4294967296 (4096.0MB)
   NewSize                  = 697892864 (665.5625MB)
   MaxNewSize               = 697892864 (665.5625MB)
   OldSize                  = 375848960 (358.4375MB)
   NewRatio                 = 2
   SurvivorRatio            = 8
   MetaspaceSize            = 21807104 (20.796875MB)
   CompressedClassSpaceSize = 1073741824 (1024.0MB)
   MaxMetaspaceSize         = 17592186044415 MB
   G1HeapRegionSize         = 0 (0.0MB)

Heap Usage:
New Generation (Eden + 1 Survivor Space):
   capacity = 628162560 (599.0625MB)
   used     = 476857008 (454.7662811279297MB)
   free     = 151305552 (144.2962188720703MB)
   75.91299424149061% used
Eden Space:
   capacity = 558432256 (532.5625MB)
   used     = 407126712 (388.2662887573242MB)
   free     = 151305544 (144.29621124267578MB)
   72.90530008352526% used
>From Space:
   capacity = 69730304 (66.5MB)
   used     = 69730296 (66.49999237060547MB)
   free     = 8 (7.62939453125E-6MB)
   99.99998852722626% used
To Space:
   capacity = 69730304 (66.5MB)
   used     = 0 (0.0MB)
   free     = 69730304 (66.5MB)
   0.0% used
concurrent mark-sweep generation:
   capacity = 375848960 (358.4375MB)
   used     = 22865096 (21.80585479736328MB)
   free     = 352983864 (336.6316452026367MB)
   6.0835863427691805% used

47785 interned Strings occupying 4807056 bytes.
{code}

Here it is with G1 GC
{code}
Heap Configuration:
   MinHeapFreeRatio         = 40
   MaxHeapFreeRatio         = 70
   MaxHeapSize              = 4294967296 (4096.0MB)
   NewSize                  = 1363144 (1.2999954223632812MB)
   MaxNewSize               = 2576351232 (2457.0MB)
   OldSize                  = 5452592 (5.1999969482421875MB)
   NewRatio                 = 2
   SurvivorRatio            = 8
   MetaspaceSize            = 21807104 (20.796875MB)
   CompressedClassSpaceSize = 1073741824 (1024.0MB)
   MaxMetaspaceSize         = 17592186044415 MB
   G1HeapRegionSize         = 1048576 (1.0MB)

Heap Usage:
G1 Heap:
   regions  = 4096
   capacity = 4294967296 (4096.0MB)
   used     = 186122248 (177.50000762939453MB)
   free     = 4108845048 (3918.4999923706055MB)
   4.333496280014515% used
G1 Young Generation:
Eden Space:
   regions  = 68
   capacity = 328204288 (313.0MB)
   used     = 71303168 (68.0MB)
   free     = 256901120 (245.0MB)
   21.72523961661342% used
Survivor Space:
   regions  = 74
   capacity = 77594624 (74.0MB)
   used     = 77594624 (74.0MB)
   free     = 0 (0.0MB)
   100.0% used
G1 Old Generation:
   regions  = 37
   capacity = 667942912 (637.0MB)
   used     = 36175880 (34.50000762939453MB)
   free     = 631767032 (602.4999923706055MB)
   5.41601375657685% used

47715 interned Strings occupying 4797216 bytes.
{code}

> periodic timeout errors in dtest
> --------------------------------
>
>                 Key: CASSANDRA-10730
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10730
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jim Witschey
>            Assignee: Jim Witschey
>
> Dtests often fail with connection timeout errors. For example:
> http://cassci.datastax.com/job/cassandra-3.1_dtest/lastCompletedBuild/testReport/upgrade_tests.cql_tests/TestCQLNodes3RF3/deletion_test/
> {code}
> ('Unable to connect to any servers', {'127.0.0.1': 
> OperationTimedOut('errors=Timed out creating connection (10 seconds), 
> last_host=None',)})
> {code}
> We've merged a PR to increase timeouts:
> https://github.com/riptano/cassandra-dtest/pull/663
> It doesn't look like this has improved things:
> http://cassci.datastax.com/view/cassandra-3.0/job/cassandra-3.0_dtest/363/testReport/
> Next steps here are
> * to scrape Jenkins history to see if and how the number of tests failing 
> this way has increased (it feels like it has). From there we can bisect over 
> the dtests, ccm, or C*, depending on what looks like the source of the 
> problem.
> * to better instrument the dtest/ccm/C* startup process to see why the nodes 
> start but don't successfully make the CQL port available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-10730) periodic timeout errors in dtest

Reply via email to