On Fri, 5 Jun 2026 17:50:34 GMT, Andrew Haley <[email protected]> wrote:

>> src/hotspot/share/c1/c1_LIRGenerator.cpp line 969:
>> 
>>> 967:     LIR_Opr tmp = new_register(T_INT);
>>> 968:     LIR_Opr step = 
>>> LIR_OprFact::intConst(DataLayout::counter_increment);
>>> 969:     __ increment_counter(step, tmp, md_reg, md->constant_encoding(), 
>>> data_offset_reg);
>> 
>> OK, so this covers C1 profiling path. The interpreter profiling path 
>> (`InterpreterMacroAssembler::profile_taken_branch`) is still not covered? So 
>> if we are mixing the subsampled counters from C1 and raw counters from 
>> intepreter, does that skew the profiling results? In the long run this is 
>> probably not a problem? But I am not sure we are actually profiling any 
>> particular bci for long enough to mitigate this.
>> 
>> What I am concerned about is the profile inversion. If there is a hot 
>> branch, it would probably progress to C1 profiling, where it would be 
>> subsampled, and would add up to profile only after some time. But the _cold_ 
>> branch that is running in intepreter would show up in profile right away. So 
>> there is time window where cold branch is over-represented over hot branch 
>> in profile. With large `ProfileCaptureRatio` this window can be 
>> uncomfortably large and fairly close to triggering the compilation with 
>> skewed profile?
>> 
>> It is not much of the problem with receiver type profiling, where 
>> interpreter and C1 are on the same subsampling footing.
>
>> What I am concerned about is the profile inversion. If there is a hot 
>> branch, it would probably progress to C1 profiling, where it would be 
>> subsampled, and would add up to profile only after some time.
> 
> Some compilations will be delayed, and some will be advanced, with 
> approximately equal probability.
> 
>> But the _cold_ branch that is running in intepreter would show up in profile 
>> right away. So there is time window where cold branch is over-represented 
>> over hot branch in profile.
> 
> Randomized profile counters add some noise to profile counts. The counters 
> can trigger overflow earlier _or later_ than they would have done without 
> randomization. I think the noise added to profile counters has something like 
> a Poisson distribution, so on average the error is about √N, where N is the 
> number of random samples. If you have a compilation threshold of 1024 it'll 
> take on average 1024 sampling events to trigger compilation, regardless of 
> `ProfileCaptureRatio`. Of those 1024 events, on average only 16 will actually 
> increment the associated counter by 64. √16 = 4, so the number of counts when 
> compilation is signalled is 1024, +- 256 for one standard distribution.
> 
>> With large `ProfileCaptureRatio` this window can be uncomfortably large and 
>> fairly close to triggering the compilation with skewed profile?
>> 
>> It is not much of the problem with receiver type profiling, where 
>> interpreter and C1 are on the same subsampling footing.

The standard deviation of a Poisson distribution is the square root of its mean.

(But I am not a statistician. Having said that, I have run a few simulations 
which seem to fit the above.)

With really low thresholds typical for the interpreter, if we used the same 
`ProfileCaptureRatio` for interpreter and C1 it would surely lead to 
compilations happening in a very different order.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/28541#discussion_r3364434618

Reply via email to