Re: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v48]
On Mon, 4 Dec 2023 13:07:15 GMT, Stefan Johansson wrote: >> In the interest of the RDP1 deadline, should we leave improving the sync >> issues with gc_total to a separate RFE? (Especially given that a "correct" >> design may take some time to come up with, and that gc_total being slightly >> out of sync is not a major issue.) > > Me and Albert discussed this again and we are ok with handling the `gc_total` > sync issue as a follow up. Please create the RFE for that. If that would > include needing a `PerfAtomicCounter`, that would a be its own RFE as well. > For me I think a lock would be a good enough solution. > > From our point of view having the counters out of sync for a long period of > time (think a long concurrent mark cycle without any young collections > updating the total) is not good since it shows that the counters are not > incremented in sync. It would also be nice to avoid the two-step updating of > the total time, so please try to find time to work on this. Thanks, opened https://bugs.openjdk.org/browse/JDK-8321304. - PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1414634977
Re: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v48]
On Fri, 1 Dec 2023 22:37:58 GMT, Jonathan Joo wrote: >> I think the ideal approach to simplify this is to support Atomic operation >> on a `PerfCounter`. >> We could either introduce a `PerfAtomicCounter`/`PerfAtomicLongCounter` >> class, or perform `Atomic::add()` on the `PerfData::_valuep` pointer. >> There's already `PerfData::get_address()`, so we might be able to do: >> >> >> Atomic::add((volatile jlong >> *)(instance->get_counter(CPUTimeGroups::CPUTimeType::gc_total)->get_address()), >> net_cpu_time); >> >> >> However, a new class `PerfAtomicCounter` is likely cleaner. E.g., we may >> also want to make `PerfAtomicCounter::sample()` use a CAS. It is probably >> better to introduce `PerfAtomicCounter` in a separate RFE later. >> >> Would the `Atomic::add()` with `PerfData::get_address()` approach be OK for >> now, or would we rather introduce a lock, or leave the `gc_total` mechanism >> as-is and address the out-of-sync-ness in a follow-up RFE? >> >> IMO the out-of-sync-ness problem is minor, because users are likely to >> either look at a single `gc_total` counter, or look at each individual GC >> CPU counter and disregard `gc_total`. > > In the interest of the RDP1 deadline, should we leave improving the sync > issues with gc_total to a separate RFE? (Especially given that a "correct" > design may take some time to come up with, and that gc_total being slightly > out of sync is not a major issue.) Me and Albert discussed this again and we are ok with handling the `gc_total` sync issue as a follow up. Please create the RFE for that. If that would include needing a `PerfAtomicCounter`, that would a be its own RFE as well. For me I think a lock would be a good enough solution. >From our point of view having the counters out of sync for a long period of >time (think a long concurrent mark cycle without any young collections >updating the total) is not good since it shows that the counters are not >incremented in sync. It would also be nice to avoid the two-step updating of >the total time, so please try to find time to work on this. - PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1413848440
Re: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v48]
On Fri, 1 Dec 2023 21:01:29 GMT, Man Cao wrote: >> I agree that the counter is valuable if always up-to-date, but if it is out >> of sync compared to the "concurrent counters" I think it will confuse some >> users. So if we want to keep it I think we should try to keep it in sync. >> >> I suppose adding a lock for updating `gc_total` should be ok. In the >> safepoint case we should never contend on that lock and for the concurrent >> updates it should not be a big deal. Basically what I think would be to >> change `update_counter(...)` to do something like: >> >> if (CPUTimeGroups::is_gc_counter(name)) { >> >> >> instance->get_counter(CPUTimeGroups::CPUTimeType::gc_total)->inc(net_cpu_time); >> } >> >> >> This way we would also be able to remove the publish part above, right? Any >> other problems with this approach? > > I think the ideal approach to simplify this is to support Atomic operation on > a `PerfCounter`. > We could either introduce a `PerfAtomicCounter`/`PerfAtomicLongCounter` > class, or perform `Atomic::add()` on the `PerfData::_valuep` pointer. There's > already `PerfData::get_address()`, so we might be able to do: > > > Atomic::add((volatile jlong > *)(instance->get_counter(CPUTimeGroups::CPUTimeType::gc_total)->get_address()), > net_cpu_time); > > > However, a new class `PerfAtomicCounter` is likely cleaner. E.g., we may > also want to make `PerfAtomicCounter::sample()` use a CAS. It is probably > better to introduce `PerfAtomicCounter` in a separate RFE later. > > Would the `Atomic::add()` with `PerfData::get_address()` approach be OK for > now, or would we rather introduce a lock, or leave the `gc_total` mechanism > as-is and address the out-of-sync-ness in a follow-up RFE? > > IMO the out-of-sync-ness problem is minor, because users are likely to either > look at a single `gc_total` counter, or look at each individual GC CPU > counter and disregard `gc_total`. In the interest of the RDP1 deadline, should we leave improving the sync issues with gc_total to a separate RFE? (Especially given that a "correct" design may take some time to come up with, and that gc_total being slightly out of sync is not a major issue.) - PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1412631873
Re: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v48]
On Fri, 1 Dec 2023 10:21:31 GMT, Stefan Johansson wrote: >> Right, see @simonis 's comments at >> https://github.com/openjdk/jdk/pull/15082#pullrequestreview-1613868256, >> https://github.com/openjdk/jdk/pull/15082#discussion_r1321703912. >> >> I initially had similar thought that `gc_total` isn't necessary and provides >> redundant data. Now I agree with @simonis that the `gc_total` is valuable to >> users. It saves users from extra work of aggregating different sets of >> counters for different garbage collectors, and potential mistakes of missing >> some counters. It is also future-proof that if GC implementation changes >> that add additional threads, users wouldn't need to change their code to add >> the counter for additional threads. >> >> I think the maintenance overhead is quite small for `gc_total` since it is >> mostly in this class. The benefit to users is worth it. > > I agree that the counter is valuable if always up-to-date, but if it is out > of sync compared to the "concurrent counters" I think it will confuse some > users. So if we want to keep it I think we should try to keep it in sync. > > I suppose adding a lock for updating `gc_total` should be ok. In the > safepoint case we should never contend on that lock and for the concurrent > updates it should not be a big deal. Basically what I think would be to > change `update_counter(...)` to do something like: > > if (CPUTimeGroups::is_gc_counter(name)) { > > > instance->get_counter(CPUTimeGroups::CPUTimeType::gc_total)->inc(net_cpu_time); > } > > > This way we would also be able to remove the publish part above, right? Any > other problems with this approach? I think the ideal approach to simplify this is to support Atomic operation on a `PerfCounter`. We could either introduce a `PerfAtomicCounter`/`PerfAtomicLongCounter` class, or perform `Atomic::add()` on the `PerfData::_valuep` pointer. There's already `PerfData::get_address()`, so we might be able to do: Atomic::add((volatile jlong *)(instance->get_counter(CPUTimeGroups::CPUTimeType::gc_total)->get_address()), net_cpu_time); However, a new class `PerfAtomicCounter` is likely cleaner. E.g., we may also want to make `PerfAtomicCounter::sample()` use a CAS. It is probably better to introduce `PerfAtomicCounter` in a separate RFE later. Would the `Atomic::add()` with `PerfData::get_address()` approach be OK for now, or would we rather introduce a lock, or leave the `gc_total` mechanism as-is and address the out-of-sync-ness in a follow-up RFE? IMO the out-of-sync-ness problem is minor, because users are likely to either look at a single `gc_total` counter, or look at each individual GC CPU counter and disregard `gc_total`. - PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1412569208
Re: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v48]
On Thu, 30 Nov 2023 21:43:36 GMT, Man Cao wrote: >> @simonis was the original suggester of this counter, so I will defer to his >> expertise. I do agree that dropping the counter would simplify things, but >> it also might not hurt to just leave it in. I'm okay with either option! > > Right, see @simonis 's comments at > https://github.com/openjdk/jdk/pull/15082#pullrequestreview-1613868256, > https://github.com/openjdk/jdk/pull/15082#discussion_r1321703912. > > I initially had similar thought that `gc_total` isn't necessary and provides > redundant data. Now I agree with @simonis that the `gc_total` is valuable to > users. It saves users from extra work of aggregating different sets of > counters for different garbage collectors, and potential mistakes of missing > some counters. It is also future-proof that if GC implementation changes that > add additional threads, users wouldn't need to change their code to add the > counter for additional threads. > > I think the maintenance overhead is quite small for `gc_total` since it is > mostly in this class. The benefit to users is worth it. I agree that the counter is valuable if always up-to-date, but if it is out of sync compared to the "concurrent counters" I think it will confuse some users. So if we want to keep it I think we should try to keep it in sync. I suppose adding a lock for updating `gc_total` should be ok. In the safepoint case we should never contend on that lock and for the concurrent updates it should not be a big deal. Basically what I think would be to change `update_counter(...)` to do something like: if (CPUTimeGroups::is_gc_counter(name)) { instance->get_counter(CPUTimeGroups::CPUTimeType::gc_total)->inc(net_cpu_time); } This way we would also be able to remove the publish part above, right? Any other problems with this approach? - PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1411897115
Re: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v48]
On Thu, 30 Nov 2023 21:17:14 GMT, Jonathan Joo wrote: >> Me and Albert just spoke and we do see the problem that two concurrent >> threads could be executing the closure at the same time. So if having a >> total counter we need to sync the updates. But when talking we started to >> question how useful it is to have the gc_total counter. It is just an >> aggregate of the other gc-counters, but it is out of sync between >> safepoints. So you will always get a more accurate value by looking at the >> individual gc-counters. >> >> We came to the conclusion that it would probably be easier to drop >> `gc_total` right now, rather than trying to keep it in sync for all updates >> to the individual counters. Because having them out of sync doesn't feel >> like a great option. >> >> Are we missing anything or do you agree? > > @simonis was the original suggester of this counter, so I will defer to his > expertise. I do agree that dropping the counter would simplify things, but it > also might not hurt to just leave it in. I'm okay with either option! Right, see @simonis 's comments at https://github.com/openjdk/jdk/pull/15082#pullrequestreview-1613868256, https://github.com/openjdk/jdk/pull/15082#discussion_r1321703912. I initially had similar thought that `gc_total` isn't necessary and provides redundant data. Now I agree with @simonis that the `gc_total` is valuable to users. It saves users from extra work of aggregating different sets of counters for different garbage collectors, and potential mistakes of missing some counters. It is also future-proof that if GC implementation changes that add additional threads, users wouldn't need to change their code to add the counter for additional threads. I think the maintenance overhead is quite small for `gc_total` since it is mostly in this class. The benefit to users is worth it. - PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1411293819
Re: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v48]
On Thu, 30 Nov 2023 09:30:14 GMT, Stefan Johansson wrote: >> Both `publish_gc_total_cpu_time` and `~ThreadTotalCPUTimeClosure` are called >> by the vm-thread inside a safepoint, so there shouldn't be any other threads >> running simultaneously, I believe. > > Me and Albert just spoke and we do see the problem that two concurrent > threads could be executing the closure at the same time. So if having a total > counter we need to sync the updates. But when talking we started to question > how useful it is to have the gc_total counter. It is just an aggregate of the > other gc-counters, but it is out of sync between safepoints. So you will > always get a more accurate value by looking at the individual gc-counters. > > We came to the conclusion that it would probably be easier to drop `gc_total` > right now, rather than trying to keep it in sync for all updates to the > individual counters. Because having them out of sync doesn't feel like a > great option. > > Are we missing anything or do you agree? @simonis was the original suggester of this counter, so I will defer to his expertise. I do agree that dropping the counter would simplify things, but it also might not hurt to just leave it in. I'm okay with either option! - PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1411270546
Re: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v48]
On Thu, 30 Nov 2023 06:45:02 GMT, Albert Mingkun Yang wrote: >> I agree that in the case that multiple closures are not active at the same >> time, we wouldn't have to implement it in this way. However, isn't it >> possible to have multiple closures active simultaneously, e.g. vm thread, >> concurrent mark thread, concurrent refine thread? I think to account for the >> races there, we can only update the `_gc_total_cpu_time_diff` member >> variable atomically during these closure destructions, and then publish the >> actual updated `gc_total` counter at manually specified times via >> `publish_gc_total_cpu_time()`. If we were to call >> `publish_gc_total_cpu_time` as part of the thread closure, I believe it >> would be difficult to prevent races when accessing the underlying counter >> from the various gc-related threads. >> >> Or maybe there is another strategy that I'm not seeing? > > Both `publish_gc_total_cpu_time` and `~ThreadTotalCPUTimeClosure` are called > by the vm-thread inside a safepoint, so there shouldn't be any other threads > running simultaneously, I believe. Me and Albert just spoke and we do see the problem that two concurrent threads could be executing the closure at the same time. So if having a total counter we need to sync the updates. But when talking we started to question how useful it is to have the gc_total counter. It is just an aggregate of the other gc-counters, but it is out of sync between safepoints. So you will always get a more accurate value by looking at the individual gc-counters. We came to the conclusion that it would probably be easier to drop `gc_total` right now, rather than trying to keep it in sync for all updates to the individual counters. Because having them out of sync doesn't feel like a great option. Are we missing anything or do you agree? - PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1410394993
Re: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v48]
On Thu, 30 Nov 2023 02:39:39 GMT, Jonathan Joo wrote: >> This two-step update does seem unnecessary, IMO. > > I agree that in the case that multiple closures are not active at the same > time, we wouldn't have to implement it in this way. However, isn't it > possible to have multiple closures active simultaneously, e.g. vm thread, > concurrent mark thread, concurrent refine thread? I think to account for the > races there, we can only update the `_gc_total_cpu_time_diff` member variable > atomically during these closure destructions, and then publish the actual > updated `gc_total` counter at manually specified times via > `publish_gc_total_cpu_time()`. If we were to call `publish_gc_total_cpu_time` > as part of the thread closure, I believe it would be difficult to prevent > races when accessing the underlying counter from the various gc-related > threads. > > Or maybe there is another strategy that I'm not seeing? Both `publish_gc_total_cpu_time` and `~ThreadTotalCPUTimeClosure` are called by the vm-thread inside a safepoint, so there shouldn't be any other threads running simultaneously, I believe. - PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1410221646
Re: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v48]
On Wed, 29 Nov 2023 15:24:52 GMT, Albert Mingkun Yang wrote: >> src/hotspot/share/runtime/cpuTimeCounters.cpp line 91: >> >>> 89: } while (old_value != fetched_value); >>> 90: get_counter(CPUTimeGroups::CPUTimeType::gc_total)->inc(fetched_value); >>> 91: } >> >> Why do we have to do this publish dance? Couldn't the closure that update >> the diff instead just update the counter? From what I can tell we never have >> multiple closures active at the same time so should be no race there? > > This two-step update does seem unnecessary, IMO. I agree that in the case that multiple closures are not active at the same time, we wouldn't have to implement it in this way. However, isn't it possible to have multiple closures active simultaneously, e.g. vm thread, concurrent mark thread, concurrent refine thread? I think to account for the races there, we can only update the `_gc_total_cpu_time_diff` member variable atomically during these closure destructions, and then publish the actual updated `gc_total` counter at manually specified times via `publish_gc_total_cpu_time()`. If we were to call `publish_gc_total_cpu_time` as part of the thread closure, I believe it would be difficult to prevent races when accessing the underlying counter from the various gc-related threads. Or maybe there is another strategy that I'm not seeing? - PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1410089070
Re: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v48]
On Wed, 29 Nov 2023 08:24:22 GMT, Stefan Johansson wrote: >> Jonathan Joo has updated the pull request incrementally with two additional >> commits since the last revision: >> >> - Fix namespace issues (2) >> >>Co-authored-by: Stefan Johansson >> <54407259+kstef...@users.noreply.github.com> >> - Fix namespace issues >> >>Co-authored-by: Stefan Johansson >> <54407259+kstef...@users.noreply.github.com> > > src/hotspot/share/runtime/cpuTimeCounters.cpp line 91: > >> 89: } while (old_value != fetched_value); >> 90: get_counter(CPUTimeGroups::CPUTimeType::gc_total)->inc(fetched_value); >> 91: } > > Why do we have to do this publish dance? Couldn't the closure that update the > diff instead just update the counter? From what I can tell we never have > multiple closures active at the same time so should be no race there? This two-step update does seem unnecessary, IMO. - PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1409452326
Re: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v48]
On Tue, 28 Nov 2023 02:22:45 GMT, Jonathan Joo wrote: >> 8315149: Add hsperf counters for CPU time of internal GC threads > > Jonathan Joo has updated the pull request incrementally with two additional > commits since the last revision: > > - Fix namespace issues (2) > >Co-authored-by: Stefan Johansson > <54407259+kstef...@users.noreply.github.com> > - Fix namespace issues > >Co-authored-by: Stefan Johansson > <54407259+kstef...@users.noreply.github.com> >From a testing point of view I think this is looking good now, re-ran the >failing test and some other jstat tests as well and they all pass. Please address the comments from Albert and we can hopefully finish this before RDP1. src/hotspot/share/runtime/cpuTimeCounters.cpp line 91: > 89: } while (old_value != fetched_value); > 90: get_counter(CPUTimeGroups::CPUTimeType::gc_total)->inc(fetched_value); > 91: } Why do we have to do this publish dance? Couldn't the closure that update the diff instead just update the counter? From what I can tell we never have multiple closures active at the same time so should be no race there? - PR Review: https://git.openjdk.org/jdk/pull/15082#pullrequestreview-1754686415 PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1408914724
Re: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v48]
> 8315149: Add hsperf counters for CPU time of internal GC threads Jonathan Joo has updated the pull request incrementally with two additional commits since the last revision: - Fix namespace issues (2) Co-authored-by: Stefan Johansson <54407259+kstef...@users.noreply.github.com> - Fix namespace issues Co-authored-by: Stefan Johansson <54407259+kstef...@users.noreply.github.com> - Changes: - all: https://git.openjdk.org/jdk/pull/15082/files - new: https://git.openjdk.org/jdk/pull/15082/files/fcc7e471..abb90258 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk=15082=47 - incr: https://webrevs.openjdk.org/?repo=jdk=15082=46-47 Stats: 6 lines in 2 files changed: 4 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/15082.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15082/head:pull/15082 PR: https://git.openjdk.org/jdk/pull/15082