guc: default to using GuC submission where possible

Dave Gordon Mon, 25 Apr 2016 03:07:42 -0700

On 22/04/16 19:45, Chris Wilson wrote:

On Fri, Apr 22, 2016 at 07:22:55PM +0100, Dave Gordon wrote:

This patch simply changes the default value of "enable_guc_submission"
from 0 (never) to -1 (auto). This means that GuC submission will be
used if the platform has a GuC, the GuC supports the request submission
protocol, and any required GuC firmwware was successfully loaded. If any
of these conditions are not met, the driver will fall back to using
execlist mode.


There are several shortcomings yet, in particular you've decided to add
new ABI...

i915_guc_wq_check_space(): Do not return ETIMEDOUT. This function's
return code is ABI.

Your choices are either EAGAIN if you think the hardware will catch up, or
more likely EIO and reset the hardware.

This layer doesn't have enough information to determine that we need areset, so it'll have to be EAGAIN. TDR should catch the case where wecontinue to be unable to submit for an extended period, for whatever reason.

I don't like the spinwait anyway, so maybe we should just return theimmediate result of sampling the WQ space once, and let the upper layersdeal with how or whether to wait-and-retry.

guc_add_workqueue_item: cannot fail

It must be fixed such that it is a void function without possibility of
failure.

The code clearly indicates that this "shouldn't happen", and yes, itwould indicate an inconsistency in the internal state of the driver ifthe WARN-and-return-error path were taken.

However, the reason for including such a check is because we're crossingbetween logical submodules here. The GuC submission code requires thatthe LRC code (which is NOT GuC-specific) has previously called theprecheck-for-space function and got the go-ahead signal. So this isessentially checking that the internal protocol has been followedcorrectly. If the LRC code were updated such that it might miss theprecheck-for-space in some paths, this WARN-and-fail check would helpidentify the protocol error.


I can change it to a BUG() if you prefer!

Not that you even successfully hooked up the failure paths in
the current code.

The current code is broken at least because the whole callchain isdeclared as returning an int error code, only to find that the topmostlevel says that add_request/emit_request are not allowed to fail!

It would be rather better if the top level handled all errors insubmission, even if it were only by dropping the request and resettingthe engine!

Same for guc_ring_doorbell.

The rationale for an error return is slightly different here, as thiswould not indicate an internal logic error in the driver, but rather afailure of communication between the driver and the doorbell hardware(or possibly the GuC firmware, but I don't think a GuC bug could causemore than one failure-to-update, hence the retry count is only 2).

If the doorbell hardware is not working correctly, there is nothing wecan do about it here; the only option is to report the failure up tosome layer that can recover, probably by resetting the GPU.

And what exactly is that atomic64_cmpxchg() serialising with? There are
no other CPUs contending with the write, and neither does the GuC (and I
doubt it is taking any notice of the lock cmpxchg). Using cmpxchg where
a single WRITE_ONCE() of a 32bit value wins the perf prize for hotest
instruction and function in the kernel.
-Chris

The doorbell controller hardware, I should think. The BSpec describesusing LOCK_CMPXCHG8B to update doorbells, so I think this code is justbased on what it says there. If the CPU hardware doesn't implement itefficiently, surely the GPU h/w designers wouldn't have mandated it inthis way?


Maybe Alex or Tom know more about the CPU<->doorbell<->GuC signalling.

.Dave.

_______________________________________________
Intel-gfx mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 2/2] drm/i915/guc: default to using GuC submission where possible

Reply via email to