On 13/03/17 01:56, Oscar Mateo wrote:


On 03/13/2017 04:46 AM, Chris Wilson wrote:
On Fri, Mar 10, 2017 at 08:28:55AM -0800, Oscar Mateo wrote:
Doorbell release flow requires that we wait for GEN8_DRB_VALID bit to go
to zero after updating db_status before we call the GuC to release the
doorbell.
Does this fix some of the '110' errors?
-Chris

No, AFAICT this doesn't fix anything (but it's required nonetheless).

The 110 errors (ENTER_S_STATE and DEALLOCATE_DOORBELL failed calls to
GuC) are cause by us resetting the GuC right before we ask it to do
something. E.g.: in the suspend path, we do
i915_gem_suspend()->i915_gem_sanitize()->intel_gpu_reset() and then
intel_guc_suspend() inmediately after. After the sanitize, the GuC is in
reset state and without a firmware loaded, so asking it to suspend is
pretty useless. Same thing when trying to orderly shutdown the doorbells
right after reset (the GuC has no idea of what doorbell we are talking
about).

For now, the simple fix would be to keep doing the reset and not ask the
GuC to do anything after. But sooner or later, we won't be able to get
away with this. E.g.: if we enable direct submission, the GuC is the
only one that will know about all the requests in-flight, and therefore
the only one that can decide when to suspend. Also, surely loading the
GuC firmware every time is slowing down the resume path by a lot?

-- Oscar

To add a bit of info, the wait is required because if there are pending rings on the doorbell the HW may require a slightly longer time to disable it after we update the status. If we call into GuC before the disabling is completed GuC will return an error because the doorbell will still be active from its point of view, even if it is in the process of shutting down. I think that with our current flow it should be impossible to hit this situation so it as Oscar said it doesn't fix anything, but it is still worth adding it to avoid issues in the future.

In regards to the reset, I agree that we should either reorder stuff to make sure that the H2G messages are sent before the reset or drop the messages entirely. In the particular case of suspend, it is not guaranteed that the GuC will survive a suspend/resume cycle (it depends on the depth of the sleep state). That is why the ENTER_S_STATE message exists: this message tells GuC to save all the information it needs in the buffer we provided; we can then tell GuC to restore this state with the EXIT_S_STATE message, which can be used on a freshly loaded FW as well. With our current flow the GuC queues should be empty when we go to suspend so it shouldn't matter, but direct submission will need this to work.

Daniele
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Reply via email to