On Fri, Mar 26, 2021 at 11:53 PM Zhou Ti (x2019cwm) <x2019...@stfx.ca> wrote: > > On Fri, 26 Mar 2021 19:54:26 +0100, Rafael J. Wysocki wrote: > > On Fri, Mar 26, 2021 at 6:53 PM Zhou Ti (x2019cwm) <x2019...@stfx.ca> wrote: > > > > > > On Fri, 26 Mar 2021 18:01:47 +0100, Rafael J. Wysocki wrote: > > > > On Thu, Mar 25, 2021 at 9:37 PM Zhou Ti (x2019cwm) <x2019...@stfx.ca> > > > > wrote: > > > > > > > > > > On March 25, 2021 15:50, Rafael J. Wysocki wrote: > > > > > > On Thu, Mar 25, 2021 at 8:18 PM Zhou Ti (x2019cwm) > > > > > > <x2019...@stfx.ca> wrote: > > > > > > > > > > > > > > On March 25, 2021 14:56, Rafael J. Wysocki wrote: > > > > > > > > On Thursday, March 25, 2021 2:14:00 PM CET Frederic Weisbecker > > > > > > > > wrote: > > > > > > > > > On Tue, Mar 16, 2021 at 04:08:08PM +0000, Zhou Ti (x2019cwm) > > > > > > > > > wrote: > > > > > > > > > > But I don't think it's a good idea to handle this in > > > > > > > > > > callers, because logically the function shouldn't return > > > > > > > > > > negative values. Returning 0 directly would allow idle > > > > > > > > > > governors to get another chance to select again. > > > > > > > > > > > > > > > > > > Hmm, I'm going to leave the last word to Rafael since cpuidle > > > > > > > > > are the only > > > > > > > > > callers of this. In any case we need to fix it. > > > > > > > > > > > > > > > > Yes, we do. > > > > > > > > > > > > > > > > So I said that I preferred to address this in the callers and > > > > > > > > the reason why > > > > > > > > is because, for example, for the teo governor it would be a > > > > > > > > matter of using > > > > > > > > a different data type to store the tick_nohz_get_sleep_length() > > > > > > > > return value, > > > > > > > > like in the (untested) patch below. > > > > > > > > > > > > > > > > So at least in this case there is no need to add any new > > > > > > > > branches anywhere. > > > > > > > > > > > > > > > > I'm still not sure about menu, because it is more complicated, > > > > > > > > but even if > > > > > > > > that one needs an extra branch, that is a win already. > > > > > > > > > > > > > > I would like to point out the potential trouble that fixing this > > > > > > > issue in the > > > > > > > callers could cause. > > > > > > > > > > > > > > 1. This function is called multiple times in menu governor and TEO > > > > > > > governor. > > > > > > > > > > > > What do you mean by "multiple times"? > > > > > > > > > > > > Each of the governors calls it once per cycle and its previous > > > > > > return > > > > > > value is not used in the next cycle at least in teo. > > > > > > > > > > I remember a governor called this function twice in a cycle, I guess > > > > > I remember > > > > > wrong. > > > > > > > > That obviously depends on the governor, but both teo and menu call it > > > > once per cycle. > > > > > > > > > > > I'm not sure that receiving results using signed integers is > > > > > > > enough > > > > > > > to solve all the problems, in the worst case it may require > > > > > > > increasing > > > > > > > the logical complexity of the code. > > > > > > > > > > > > That is a valid concern, so it is a tradeoff between increasing the > > > > > > logical complexity of the code and adding branches to it. > > > > > > > > > > > > > 2. This function is important for developing idle governor. > > > > > > > If the problem is not fixed in the function itself, then this > > > > > > > potential > > > > > > > pitfall should be explicitly stated in the documentation. > > > > > > > > > > > > That I can agree with. > > > > > > > > > > > > > This is because > > > > > > > it is difficult to predict from the definition and naming of the > > > > > > > function > > > > > > > that it might return a negative number. I actually discovered > > > > > > > this anomaly > > > > > > > when I was doing data analysis on my own idle governor. For some > > > > > > > idle control > > > > > > > algorithms, this exception return could lead to serious > > > > > > > consequences, > > > > > > > because negative return logically won't happen. > > > > > > > > > > > > Well, it's a matter of how to take the possible negative return > > > > > > value > > > > > > into account so it does not affect the result of the computations. > > > > > > > > > > I think it is challenging for some algorithms to take negative return > > > > > values > > > > > into account properly. For TEO (and even menu), it is possible to > > > > > solve the problem by just changing the way the data is received is > > > > > because the > > > > > learning mechanism for both algorithms is simple. > > > > > > > > Of course this depends on the governor. > > > > > > > > > One of the interesting things about the CPUIdle subsystem is that it > > > > > is well > > > > > suited to introduce machine learning and probabilistic statistical > > > > > methods. > > > > > > > > You need to remember that the governor code runs in the idle loop > > > > context which is expected to be reasonably fast. > > > > > > > > That's why we are worrying about individual branches here. > > > > > > > > > This means that many of the more complex and data-sensitive > > > > > algorithms can > > > > > potentially be explored. In the best case we will still need to add > > > > > additional > > > > > code complexity to a new algorithm. > > > > > > > > So I'm not sure what the problem with adding an upfront negative value > > > > check to the governor is. > > > > > > > > > It would reduce a lot of unnecessary considerations (for example, > > > > > highlight > > > > > this shortcoming in the documentation) if we could ensure that this > > > > > function > > > > > would work as it is logically defined. But I don't really understand > > > > > how much of a burden adding an extra branch would impose, so I don't > > > > > know if > > > > > this tradeoff is worth it. > > > > > > > > It ultimately depends on the governor, which is why I think that the > > > > negative value check should be done by the governor, if needed, and > > > > not by the function called by it, because in the latter case the check > > > > may be redundant and we end up with an extra branch (or two branches > > > > in this particular case) for no good reason whatsoever. > > > > > > > > Yes, there are governors which simply can do the negative value check > > > > upfront right after calling that function and ensure that they will > > > > not deal with negative values going forward. This is probably what > > > > I'll do in the menu case. > > > > > > > > However, if the governor is simple enough and it can avoid doing the > > > > explicit negative value check, I don't see a reason to do that check > > > > elsewhere "just in case". > > > > > > Makes sense. I will submit my patch to fix this issue in menu and TEO. > > > > Well, I have patches for that already and they are not > > super-straightforward. Though If you want to try to fix this > > yourself, I'll wait for your submission. > > Thanks! I really like this subsystem, so I hope to contribute a little. > I still have some questions. > > For TEO governor: > Even if we change some datatypes as your patch did before, some explicit > type conversions still need to be added to prevent wrong results.
That's true. That patch was way incomplete and I would start with changing the data type of exit_latency_ns and target_residency_ns in struct cpuidle_state to s64 for this reason. > For example: > line 276 (because 1u > -1 will be false) > line 329 > line 422 > > Since some of them are in a loop, does the overhead caused by the > type conversions worth it? or do I need to do some pre-processing to avoid > duplicate conversions (which may cause additional space overhead) ? Well, it looks like it may be better if I send my patches after all, because it will take less time overall than me explaining here what I would do. Let me do that. > For menu governor: > If we simply change the datatypes, the conversions required are even more. I would add a negative return value check to menu as I said before. > For example: > lines 133-142 > line 202 > line 289 > lines 302-309 > line 320 > lines 327-328 > line 347 > > If we just do exception handling on next_timer_ns and delta_next, > that still requires two additional branches. Not really AFAICS, but again let me send a patch to illustrate my point.