On 06/09/2019 18:10, Parth Shah wrote: > Right, CPU capacity can solve the problem of indicating the thermal throttle > to the scheduler. > AFAIU, the patchset from Thara changes CPU capacity to reflect Thermal > headroom of the CPU. > This is a nice mitigation but, > 1. Sometimes a single task is responsible for the Thermal heatup of the core, > reducing the > CPU capacity of all the CPUs in the core is not optimal when just moving > such single > task to other core can allow us to remain within thermal headroom. This is > important > for the servers especially where there are upto 8 threads.> 2. Given the > implementation in the patches and its integration with EAS, it seems difficult > to adapt to servers, where CPU capacity itself is in doubt. > https://lkml.org/lkml/2019/5/15/1402 >
I'd nuance this to *SMT* capacity (which isn't just servers). The thing is that it's difficult to come up with a sensible scheme to describe the base capacity of a single logical CPU. But yeah, valid point. >> >> For active balance, we actually already have a condition that moves a task >> to a less capacity-pressured CPU (although it is somewhat specific). So if >> thermal pressure follows that task (e.g. it's doing tons of vector/float), >> it will be rotated around. > > Agree. But this should break in certain conditions like when we have multiple > tasks > in a core with almost equal utilization among which one is just doing vector > operations. > LB can pick and move any task with equal probability if the capacity is > reduced here. > Right, if/when we get things like per-unit signals (wasn't there something about tracking AVX a few months back?) then we'll be able to make more informed decisions, for now we'll need some handholding (read: task classification). >> >> However there should be a point made on latency vs throughput. If you >> care about latency you probably do not want to active balance your task. If > > Can you please elaborate on why not to consider active balance for latency > sensitive tasks? > Because, sometimes finding a thermally cool core is beneficial when Turbo > frequency > range is around 20% above rated ones. > This goes back to my reply to Patrick further up the thread. Right now active balance can happen just because we've been imbalanced for some time and repeatedly failed to migrate anything. After 3 (IIRC) successive failed attempts, we'll active balance the running task of the remote rq we decided was busiest. If that happens to be a latency sensitive task, that's not great - active balancing means stopping that task's execution, so we're going to add some latency to this latency-sensitive task. My proposal was to further ratelimit active balance (e.g. require more failed attempts) when the task that would be preempted is latency-sensitive. My point is: if that task is doing fine where it is, why preempt it? That's just introducing latency IMO (keeping in mind that those balance attempts could happen despite not having any thermal pressure). If you care about performance (e.g. a minimum level of throughput), to me that is a separate (though perhaps not entirely distinct) property. >> you care about throughput, it should be specified in some way (util-clamp >> says hello!). >> > > yes I do care for latency and throughput both. :-) Don't we all! > but I'm wondering how uclamp can solve the problem for throughput. > If I make the thermally hot tasks to appear bigger than other tasks then > reducing > CPU capacity can allow such tasks to move around the chip. > But this will require the utilization value to be relatively large compared > to the other > tasks in the core. Or other task's uclamp.max can be lowered to make such > task rotate. > If I got it right, then this will be a difficult UCLAMP usecase from user > perspective, right? > I feel like I'm missing something here. > Hmm perhaps I was jumping the gun here. What I was getting to is if you have something like misfit that migrates tasks to CPUs of higher capacity than the one they are on, you could use uclamp to flag them. You could translate your throughput requirement as a uclamp.min of e.g. 80%, and if the CPU capacity goes below that (or close within a margin) then you'd try to migrate the task to a CPU of higher capacity (i.e. not or less thermally pressured). This doesn't have to involve your less throughput-sensitive tasks, since you would only tag and take action for your throughput-sensitive tasks.