Jeremiah Jordan created CASSANDRA-9124:
------------------------------------------
Summary: GCInspector logs very different times after CASSANDRA-7638
Key: CASSANDRA-9124
URL: https://issues.apache.org/jira/browse/CASSANDRA-9124
Project: Cassandra
Issue Type: Bug
Components: Core
Reporter: Jeremiah Jordan
Priority: Minor
Fix For: 2.1.5
After the GCInspector rewrite in CASSANDRA-7638 the times reported for CMS are
the full time (including all the concurrent time), not just the stop the world
pause time. In previous versions we reported just the stop the world pause
time. This change is kind of scary for someone used to the old logs, and is
also not as useful. You can't get "how long were things really stopped" from
the log message any more.
For example, this is a CMS that got logged in C* 2.1:
{noformat}
INFO [Service Thread] 2015-04-03 23:58:37,583 GCInspector.java:142 -
ConcurrentMarkSweep GC in 12926ms. CMS Old Gen: 5305346280 -> 1106799064; Par
Eden Space: 223080 -> 158423560; Par Survivor Space: 42081744 -> 51339584
{noformat}
And here is the corresponding information for that CMS from the gc log.
{noformat}
2015-04-03T23:58:24.656+0000: 8064.780: [GC [1 CMS-initial-mark:
5181002K(6901760K)] 5222315K(7639040K), 0.0316710 secs] [Times: user=0.03
sys=0.00, real=0.03 secs]
2015-04-03T23:58:24.688+0000: 8064.812: Total time for which application
threads were stopped: 0.0324490 seconds
2015-04-03T23:58:24.688+0000: 8064.812: [CMS-concurrent-mark-start]
2015-04-03T23:58:26.939+0000: 8067.062: [CMS-concurrent-mark: 2.176/2.250 secs]
[Times: user=12.94 sys=1.73, real=2.25 secs]
2015-04-03T23:58:26.939+0000: 8067.063: [CMS-concurrent-preclean-start]
2015-04-03T23:58:27.209+0000: 8067.333: [CMS-concurrent-preclean: 0.187/0.270
secs] [Times: user=1.53 sys=0.15, real=0.28 secs]
2015-04-03T23:58:27.210+0000: 8067.333:
[CMS-concurrent-abortable-preclean-start]
2015-04-03T23:58:27.988+0000: 8068.112: [CMS-concurrent-abortable-preclean:
0.759/0.779 secs] [Times: user=4.07 sys=0.74, real=0.77 secs]
2015-04-03T23:58:27.989+0000: 8068.113: [GC[YG occupancy: 488441 K (737280
K)]2015-04-03T23:58:27.989+0000: 8068.113: [Rescan (parallel) , 0.3688960
secs]2015-04-03T23:58:28.358+0000: 8068.482: [weak refs processing, 0.0009620
secs]2015-04-03T23:58:28.359+0000: 8068.483: [class unloading, 0.0060870
secs]2015-04-03T23:58:28.365+0000: 8068.489: [scrub symbol table, 0.0146010
secs]2015-04-03T23:58:28.380+0000: 8068.504: [scrub string table, 0.0031270
secs] [1 CMS-remark: 5231445K(6901760K)] 5719886K(7639040K), 0.3953770 secs]
[Times: user=2.96 sys=0.00, real=0.39 secs]
2015-04-03T23:58:28.385+0000: 8068.508: Total time for which application
threads were stopped: 0.3962470 seconds
2015-04-03T23:58:28.385+0000: 8068.509: [CMS-concurrent-sweep-start]
2015-04-03T23:58:37.582+0000: 8077.706: [CMS-concurrent-sweep: 8.661/9.197
secs] [Times: user=44.80 sys=9.58, real=9.20 secs]
2015-04-03T23:58:37.589+0000: 8077.713: [CMS-concurrent-reset-start]
2015-04-03T23:58:37.633+0000: 8077.757: [CMS-concurrent-reset: 0.044/0.044
secs] [Times: user=0.19 sys=0.10, real=0.04 secs]
{noformat}
The entire CMS took the 12 seconds reported in the GCIspector log message.
Previously we would have only reported the 0.39 seconds that were spent in STW
pauses.
At the least we need to change the log message so that people don't think we
are still just reporting STW time. But it would be more helpful if we could
get the STW time and put that into the log message like we had previously.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)