On 06/08/2018 10:59, Chris Wilson wrote:
Quoting Tvrtko Ursulin (2018-08-06 10:34:54)

On 06/08/2018 09:30, Chris Wilson wrote:
If we are waiting for the currently executing request, we have a good
idea that it will be completed in the very near future and so want to
cap the CPU_DMA_LATENCY to ensure that we wake up the client quickly.

I cannot shake the opinion that we shouldn't be doing this. For instance
what if the client has been re-niced (down), or it has re-niced itself?
Obviously wrong to apply this for those.

Niceness only restricts its position on the scheduler runqueue, doesn't
actually have any cpufreq implications (give or take RT heuristics).
So I don't think we need a tsk->prio restriction.

I was thinking the client obviously doesn't care about latency or anything (more or less) so we would be incorrectly applying the PM QoS request.

Or when you say we have a good idea something will be completed in the
very near future. Say there is a 60fps workload which is sending 5ms
batches and waits on them. That would be 30% of time spent outside of
low C states for a workload which doesn't need it.

Quite frankly, they shouldn't be using wait on the current frame. For
example, in mesa you wait for the end of the previous frame which should
be roughly complete, and since it is a stall before computing the next,
latency is still important.

Maybe, but I think we shouldn't go into details on how a client might use it, and create limitations / hidden gotchas on this level if they do not behave as we expect / prescribe.

But I have noticed so far you have been avoiding to comment on the idea of explicit flag. :)

Regards,

Tvrtko

Also having read what the OpenCL does, where they want to apply
different wait optimisations for different call-sites, the idea that we
should instead be introducing a low-latency flag to wait ioctl sounds
more appropriate.

I'm not impressed by what I've heard there yet. There's also the
dilemma with what to do with dma-fence poll().

+             if (!qos &&
+                 i915_seqno_passed(intel_engine_get_seqno(rq->engine),
+                                   wait.seqno - 1))

I also realized that this will get incorrectly applied when there is
preemption. If a low-priority request gets preempted after we applied
the PM QoS it will persist for much longer than intended. (Until the
high-prio request completes and then low-prio one.) And the explicit
low-latency wait flag would have the same problem. We could perhaps go
with removing the PM QoS request if preempted. It should not be frequent
enough to cause issue with too much traffic on the API. But

Sure, I didn't think it was worth worrying about. We could cancel it and
reset it on next execution.
Another side note - quick grep shows there are a few other "seqno - 1"
callsites so perhaps we should add a helper for this with a more
self-explanatory like __i915_seqno_is_executing(engine, seqno) or something?

I briefly considered something along those lines,
intel_engine_has_signaled(), intel_engine_has_started. I also noticed
that I didn't kill i915_request_started even though I though we had.
-Chris

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Reply via email to