Re: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v48]

2023-12-04 Thread Man Cao
On Mon, 4 Dec 2023 13:07:15 GMT, Stefan Johansson  wrote:

>> In the interest of the RDP1 deadline, should we leave improving the sync 
>> issues with gc_total to a separate RFE? (Especially given that a "correct" 
>> design may take some time to come up with, and that gc_total being slightly 
>> out of sync is not a major issue.)
>
> Me and Albert discussed this again and we are ok with handling the `gc_total` 
> sync issue as a follow up. Please create the RFE for that. If that would 
> include needing a `PerfAtomicCounter`, that would a be its own RFE as well. 
> For me I think a lock would be a good enough solution. 
> 
> From our point of view having the counters out of sync for a long period of 
> time (think a long concurrent mark cycle without any young collections 
> updating the total) is not good since it shows that the counters are not 
> incremented in sync. It would also be nice to avoid the two-step updating of 
> the total time, so please try to find time to work on this.

Thanks, opened https://bugs.openjdk.org/browse/JDK-8321304.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1414634977


Re: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v48]

2023-12-04 Thread Stefan Johansson
On Fri, 1 Dec 2023 22:37:58 GMT, Jonathan Joo  wrote:

>> I think the ideal approach to simplify this is to support Atomic operation 
>> on a `PerfCounter`.
>> We could either introduce a `PerfAtomicCounter`/`PerfAtomicLongCounter` 
>> class, or perform `Atomic::add()` on the `PerfData::_valuep` pointer. 
>> There's already `PerfData::get_address()`, so we might be able to do:
>> 
>> 
>> Atomic::add((volatile jlong 
>> *)(instance->get_counter(CPUTimeGroups::CPUTimeType::gc_total)->get_address()),
>>  net_cpu_time);
>> 
>> 
>> However, a new class `PerfAtomicCounter`  is likely cleaner. E.g., we may 
>> also want to make `PerfAtomicCounter::sample()` use a CAS. It is probably 
>> better to introduce `PerfAtomicCounter` in a separate RFE later.
>> 
>> Would the `Atomic::add()` with `PerfData::get_address()` approach be OK for 
>> now, or would we rather introduce a lock, or leave the `gc_total` mechanism 
>> as-is and address the out-of-sync-ness in a follow-up RFE?
>> 
>> IMO the out-of-sync-ness problem is minor, because users are likely to 
>> either look at a single `gc_total` counter, or look at each individual GC  
>> CPU counter and disregard `gc_total`.
>
> In the interest of the RDP1 deadline, should we leave improving the sync 
> issues with gc_total to a separate RFE? (Especially given that a "correct" 
> design may take some time to come up with, and that gc_total being slightly 
> out of sync is not a major issue.)

Me and Albert discussed this again and we are ok with handling the `gc_total` 
sync issue as a follow up. Please create the RFE for that. If that would 
include needing a `PerfAtomicCounter`, that would a be its own RFE as well. For 
me I think a lock would be a good enough solution. 

>From our point of view having the counters out of sync for a long period of 
>time (think a long concurrent mark cycle without any young collections 
>updating the total) is not good since it shows that the counters are not 
>incremented in sync. It would also be nice to avoid the two-step updating of 
>the total time, so please try to find time to work on this.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1413848440


Re: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v48]

2023-12-01 Thread Jonathan Joo
On Fri, 1 Dec 2023 21:01:29 GMT, Man Cao  wrote:

>> I agree that the counter is valuable if always up-to-date, but if it is out 
>> of sync compared to the "concurrent counters" I think it will confuse some 
>> users. So if we want to keep it I think we should try to keep it in sync. 
>> 
>> I suppose adding a lock for updating `gc_total` should be ok. In the 
>> safepoint case we should never contend on that lock and for the concurrent 
>> updates it should not be a big deal. Basically what I think would be to 
>> change `update_counter(...)` to do something like:
>> 
>> if (CPUTimeGroups::is_gc_counter(name)) {
>>   
>>   
>> instance->get_counter(CPUTimeGroups::CPUTimeType::gc_total)->inc(net_cpu_time);
>> }
>> 
>> 
>> This way we would also be able to remove the publish part above, right? Any 
>> other problems with this approach?
>
> I think the ideal approach to simplify this is to support Atomic operation on 
> a `PerfCounter`.
> We could either introduce a `PerfAtomicCounter`/`PerfAtomicLongCounter` 
> class, or perform `Atomic::add()` on the `PerfData::_valuep` pointer. There's 
> already `PerfData::get_address()`, so we might be able to do:
> 
> 
> Atomic::add((volatile jlong 
> *)(instance->get_counter(CPUTimeGroups::CPUTimeType::gc_total)->get_address()),
>  net_cpu_time);
> 
> 
> However, a new class `PerfAtomicCounter`  is likely cleaner. E.g., we may 
> also want to make `PerfAtomicCounter::sample()` use a CAS. It is probably 
> better to introduce `PerfAtomicCounter` in a separate RFE later.
> 
> Would the `Atomic::add()` with `PerfData::get_address()` approach be OK for 
> now, or would we rather introduce a lock, or leave the `gc_total` mechanism 
> as-is and address the out-of-sync-ness in a follow-up RFE?
> 
> IMO the out-of-sync-ness problem is minor, because users are likely to either 
> look at a single `gc_total` counter, or look at each individual GC  CPU 
> counter and disregard `gc_total`.

In the interest of the RDP1 deadline, should we leave improving the sync issues 
with gc_total to a separate RFE? (Especially given that a "correct" design may 
take some time to come up with, and that gc_total being slightly out of sync is 
not a major issue.)

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1412631873


Re: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v48]

2023-12-01 Thread Man Cao
On Fri, 1 Dec 2023 10:21:31 GMT, Stefan Johansson  wrote:

>> Right, see @simonis 's comments at 
>> https://github.com/openjdk/jdk/pull/15082#pullrequestreview-1613868256, 
>> https://github.com/openjdk/jdk/pull/15082#discussion_r1321703912.
>> 
>> I initially had similar thought that `gc_total` isn't necessary and provides 
>> redundant data. Now I agree with @simonis that the `gc_total` is valuable to 
>> users. It saves users from extra work of aggregating different sets of 
>> counters for different garbage collectors, and potential mistakes of missing 
>> some counters. It is also future-proof that if GC implementation changes 
>> that add additional threads, users wouldn't need to change their code to add 
>> the counter for additional threads.
>> 
>> I think the maintenance overhead is quite small for `gc_total` since it is 
>> mostly in this class. The benefit to users is worth it.
>
> I agree that the counter is valuable if always up-to-date, but if it is out 
> of sync compared to the "concurrent counters" I think it will confuse some 
> users. So if we want to keep it I think we should try to keep it in sync. 
> 
> I suppose adding a lock for updating `gc_total` should be ok. In the 
> safepoint case we should never contend on that lock and for the concurrent 
> updates it should not be a big deal. Basically what I think would be to 
> change `update_counter(...)` to do something like:
> 
> if (CPUTimeGroups::is_gc_counter(name)) {
>   
>   
> instance->get_counter(CPUTimeGroups::CPUTimeType::gc_total)->inc(net_cpu_time);
> }
> 
> 
> This way we would also be able to remove the publish part above, right? Any 
> other problems with this approach?

I think the ideal approach to simplify this is to support Atomic operation on a 
`PerfCounter`.
We could either introduce a `PerfAtomicCounter`/`PerfAtomicLongCounter` class, 
or perform `Atomic::add()` on the `PerfData::_valuep` pointer. There's already 
`PerfData::get_address()`, so we might be able to do:


Atomic::add((volatile jlong 
*)(instance->get_counter(CPUTimeGroups::CPUTimeType::gc_total)->get_address()), 
net_cpu_time);


However, a new class `PerfAtomicCounter`  is likely cleaner. E.g., we may also 
want to make `PerfAtomicCounter::sample()` use a CAS. It is probably better to 
introduce `PerfAtomicCounter` in a separate RFE later.

Would the `Atomic::add()` with `PerfData::get_address()` approach be OK for 
now, or would we rather introduce a lock, or leave the `gc_total` mechanism 
as-is and address the out-of-sync-ness in a follow-up RFE?

IMO the out-of-sync-ness problem is minor, because users are likely to either 
look at a single `gc_total` counter, or look at each individual GC  CPU counter 
and disregard `gc_total`.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1412569208


Re: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v48]

2023-12-01 Thread Stefan Johansson
On Thu, 30 Nov 2023 21:43:36 GMT, Man Cao  wrote:

>> @simonis was the original suggester of this counter, so I will defer to his 
>> expertise. I do agree that dropping the counter would simplify things, but 
>> it also might not hurt to just leave it in. I'm okay with either option!
>
> Right, see @simonis 's comments at 
> https://github.com/openjdk/jdk/pull/15082#pullrequestreview-1613868256, 
> https://github.com/openjdk/jdk/pull/15082#discussion_r1321703912.
> 
> I initially had similar thought that `gc_total` isn't necessary and provides 
> redundant data. Now I agree with @simonis that the `gc_total` is valuable to 
> users. It saves users from extra work of aggregating different sets of 
> counters for different garbage collectors, and potential mistakes of missing 
> some counters. It is also future-proof that if GC implementation changes that 
> add additional threads, users wouldn't need to change their code to add the 
> counter for additional threads.
> 
> I think the maintenance overhead is quite small for `gc_total` since it is 
> mostly in this class. The benefit to users is worth it.

I agree that the counter is valuable if always up-to-date, but if it is out of 
sync compared to the "concurrent counters" I think it will confuse some users. 
So if we want to keep it I think we should try to keep it in sync. 

I suppose adding a lock for updating `gc_total` should be ok. In the safepoint 
case we should never contend on that lock and for the concurrent updates it 
should not be a big deal. Basically what I think would be to change 
`update_counter(...)` to do something like:

if (CPUTimeGroups::is_gc_counter(name)) {
  
  
instance->get_counter(CPUTimeGroups::CPUTimeType::gc_total)->inc(net_cpu_time);
}


This way we would also be able to remove the publish part above, right? Any 
other problems with this approach?

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1411897115


Re: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v48]

2023-11-30 Thread Man Cao
On Thu, 30 Nov 2023 21:17:14 GMT, Jonathan Joo  wrote:

>> Me and Albert just spoke and we do see the problem that two concurrent 
>> threads could be executing the closure at the same time. So if having a 
>> total counter we need to sync the updates. But when talking we started to 
>> question how useful it is to have the gc_total counter. It is just an 
>> aggregate of the other gc-counters, but it is out of sync between 
>> safepoints. So you will always get a more accurate value by looking at the 
>> individual gc-counters.
>> 
>> We came to the conclusion that it would probably be easier to drop 
>> `gc_total` right now, rather than trying to keep it in sync for all updates 
>> to the individual counters. Because having them out of sync doesn't feel 
>> like a great option. 
>> 
>> Are we missing anything or do you agree?
>
> @simonis was the original suggester of this counter, so I will defer to his 
> expertise. I do agree that dropping the counter would simplify things, but it 
> also might not hurt to just leave it in. I'm okay with either option!

Right, see @simonis 's comments at 
https://github.com/openjdk/jdk/pull/15082#pullrequestreview-1613868256, 
https://github.com/openjdk/jdk/pull/15082#discussion_r1321703912.

I initially had similar thought that `gc_total` isn't necessary and provides 
redundant data. Now I agree with @simonis that the `gc_total` is valuable to 
users. It saves users from extra work of aggregating different sets of counters 
for different garbage collectors, and potential mistakes of missing some 
counters. It is also future-proof that if GC implementation changes that add 
additional threads, users wouldn't need to change their code to add the counter 
for additional threads.

I think the maintenance overhead is quite small for `gc_total` since it is 
mostly in this class. The benefit to users is worth it.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1411293819


Re: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v48]

2023-11-30 Thread Jonathan Joo
On Thu, 30 Nov 2023 09:30:14 GMT, Stefan Johansson  wrote:

>> Both `publish_gc_total_cpu_time` and `~ThreadTotalCPUTimeClosure` are called 
>> by the vm-thread inside a safepoint, so there shouldn't be any other threads 
>> running simultaneously, I believe.
>
> Me and Albert just spoke and we do see the problem that two concurrent 
> threads could be executing the closure at the same time. So if having a total 
> counter we need to sync the updates. But when talking we started to question 
> how useful it is to have the gc_total counter. It is just an aggregate of the 
> other gc-counters, but it is out of sync between safepoints. So you will 
> always get a more accurate value by looking at the individual gc-counters.
> 
> We came to the conclusion that it would probably be easier to drop `gc_total` 
> right now, rather than trying to keep it in sync for all updates to the 
> individual counters. Because having them out of sync doesn't feel like a 
> great option. 
> 
> Are we missing anything or do you agree?

@simonis was the original suggester of this counter, so I will defer to his 
expertise. I do agree that dropping the counter would simplify things, but it 
also might not hurt to just leave it in. I'm okay with either option!

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1411270546


Re: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v48]

2023-11-30 Thread Stefan Johansson
On Thu, 30 Nov 2023 06:45:02 GMT, Albert Mingkun Yang  wrote:

>> I agree that in the case that multiple closures are not active at the same 
>> time, we wouldn't have to implement it in this way. However, isn't it 
>> possible to have multiple closures active simultaneously, e.g.  vm thread, 
>> concurrent mark thread, concurrent refine thread? I think to account for the 
>> races there, we can only update the `_gc_total_cpu_time_diff` member 
>> variable atomically during these closure destructions, and then publish the 
>> actual updated `gc_total` counter at manually specified times via 
>> `publish_gc_total_cpu_time()`. If we were to call 
>> `publish_gc_total_cpu_time` as part of the thread closure, I believe it 
>> would be difficult to prevent races when accessing the underlying counter 
>> from the various gc-related threads.
>> 
>> Or maybe there is another strategy that I'm not seeing?
>
> Both `publish_gc_total_cpu_time` and `~ThreadTotalCPUTimeClosure` are called 
> by the vm-thread inside a safepoint, so there shouldn't be any other threads 
> running simultaneously, I believe.

Me and Albert just spoke and we do see the problem that two concurrent threads 
could be executing the closure at the same time. So if having a total counter 
we need to sync the updates. But when talking we started to question how useful 
it is to have the gc_total counter. It is just an aggregate of the other 
gc-counters, but it is out of sync between safepoints. So you will always get a 
more accurate value by looking at the individual gc-counters.

We came to the conclusion that it would probably be easier to drop `gc_total` 
right now, rather than trying to keep it in sync for all updates to the 
individual counters. Because having them out of sync doesn't feel like a great 
option. 

Are we missing anything or do you agree?

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1410394993


Re: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v48]

2023-11-29 Thread Albert Mingkun Yang
On Thu, 30 Nov 2023 02:39:39 GMT, Jonathan Joo  wrote:

>> This two-step update does seem unnecessary, IMO.
>
> I agree that in the case that multiple closures are not active at the same 
> time, we wouldn't have to implement it in this way. However, isn't it 
> possible to have multiple closures active simultaneously, e.g.  vm thread, 
> concurrent mark thread, concurrent refine thread? I think to account for the 
> races there, we can only update the `_gc_total_cpu_time_diff` member variable 
> atomically during these closure destructions, and then publish the actual 
> updated `gc_total` counter at manually specified times via 
> `publish_gc_total_cpu_time()`. If we were to call `publish_gc_total_cpu_time` 
> as part of the thread closure, I believe it would be difficult to prevent 
> races when accessing the underlying counter from the various gc-related 
> threads.
> 
> Or maybe there is another strategy that I'm not seeing?

Both `publish_gc_total_cpu_time` and `~ThreadTotalCPUTimeClosure` are called by 
the vm-thread inside a safepoint, so there shouldn't be any other threads 
running simultaneously, I believe.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1410221646


Re: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v48]

2023-11-29 Thread Jonathan Joo
On Wed, 29 Nov 2023 15:24:52 GMT, Albert Mingkun Yang  wrote:

>> src/hotspot/share/runtime/cpuTimeCounters.cpp line 91:
>> 
>>> 89:   } while (old_value != fetched_value);
>>> 90:   get_counter(CPUTimeGroups::CPUTimeType::gc_total)->inc(fetched_value);
>>> 91: }
>> 
>> Why do we have to do this publish dance? Couldn't the closure that update 
>> the diff instead just update the counter? From what I can tell we never have 
>> multiple closures active at the same time so should be no race there?
>
> This two-step update does seem unnecessary, IMO.

I agree that in the case that multiple closures are not active at the same 
time, we wouldn't have to implement it in this way. However, isn't it possible 
to have multiple closures active simultaneously, e.g.  vm thread, concurrent 
mark thread, concurrent refine thread? I think to account for the races there, 
we can only update the `_gc_total_cpu_time_diff` member variable atomically 
during these closure destructions, and then publish the actual updated 
`gc_total` counter at manually specified times via 
`publish_gc_total_cpu_time()`. If we were to call `publish_gc_total_cpu_time` 
as part of the thread closure, I believe it would be difficult to prevent races 
when accessing the underlying counter from the various gc-related threads.

Or maybe there is another strategy that I'm not seeing?

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1410089070


Re: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v48]

2023-11-29 Thread Albert Mingkun Yang
On Wed, 29 Nov 2023 08:24:22 GMT, Stefan Johansson  wrote:

>> Jonathan Joo has updated the pull request incrementally with two additional 
>> commits since the last revision:
>> 
>>  - Fix namespace issues (2)
>>
>>Co-authored-by: Stefan Johansson 
>> <54407259+kstef...@users.noreply.github.com>
>>  - Fix namespace issues
>>
>>Co-authored-by: Stefan Johansson 
>> <54407259+kstef...@users.noreply.github.com>
>
> src/hotspot/share/runtime/cpuTimeCounters.cpp line 91:
> 
>> 89:   } while (old_value != fetched_value);
>> 90:   get_counter(CPUTimeGroups::CPUTimeType::gc_total)->inc(fetched_value);
>> 91: }
> 
> Why do we have to do this publish dance? Couldn't the closure that update the 
> diff instead just update the counter? From what I can tell we never have 
> multiple closures active at the same time so should be no race there?

This two-step update does seem unnecessary, IMO.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1409452326


Re: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v48]

2023-11-29 Thread Stefan Johansson
On Tue, 28 Nov 2023 02:22:45 GMT, Jonathan Joo  wrote:

>> 8315149: Add hsperf counters for CPU time of internal GC threads
>
> Jonathan Joo has updated the pull request incrementally with two additional 
> commits since the last revision:
> 
>  - Fix namespace issues (2)
>
>Co-authored-by: Stefan Johansson 
> <54407259+kstef...@users.noreply.github.com>
>  - Fix namespace issues
>
>Co-authored-by: Stefan Johansson 
> <54407259+kstef...@users.noreply.github.com>

>From a testing point of view I think this is looking good now, re-ran the 
>failing test and some other jstat tests as well and they all pass.

Please address the comments from Albert and we can hopefully finish this before 
RDP1.

src/hotspot/share/runtime/cpuTimeCounters.cpp line 91:

> 89:   } while (old_value != fetched_value);
> 90:   get_counter(CPUTimeGroups::CPUTimeType::gc_total)->inc(fetched_value);
> 91: }

Why do we have to do this publish dance? Couldn't the closure that update the 
diff instead just update the counter? From what I can tell we never have 
multiple closures active at the same time so should be no race there?

-

PR Review: https://git.openjdk.org/jdk/pull/15082#pullrequestreview-1754686415
PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1408914724


Re: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v48]

2023-11-27 Thread Jonathan Joo
> 8315149: Add hsperf counters for CPU time of internal GC threads

Jonathan Joo has updated the pull request incrementally with two additional 
commits since the last revision:

 - Fix namespace issues (2)
   
   Co-authored-by: Stefan Johansson <54407259+kstef...@users.noreply.github.com>
 - Fix namespace issues
   
   Co-authored-by: Stefan Johansson <54407259+kstef...@users.noreply.github.com>

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/15082/files
  - new: https://git.openjdk.org/jdk/pull/15082/files/fcc7e471..abb90258

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk=15082=47
 - incr: https://webrevs.openjdk.org/?repo=jdk=15082=46-47

  Stats: 6 lines in 2 files changed: 4 ins; 0 del; 2 mod
  Patch: https://git.openjdk.org/jdk/pull/15082.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/15082/head:pull/15082

PR: https://git.openjdk.org/jdk/pull/15082