On Tue, 2023-08-15 at 09:56 -0400, Vivi, Rodrigo wrote:
> On Mon, Aug 14, 2023 at 06:12:09PM -0700, Alan Previn wrote:
> > If we are at the end of suspend or very early in resume
> > its possible an async fence signal could lead us to the
> > execution of the context destruction worker (after the
> > prior worker flush).

[snip]
> 
> > @@ -3199,10 +3206,20 @@ static inline void guc_lrc_desc_unpin(struct 
> > intel_context *ce)
> >     if (unlikely(disabled)) {
> >             release_guc_id(guc, ce);
> >             __guc_context_destroy(ce);
> > -           return;
> > +           return 0;
> > +   }
> > +
> > +   if (deregister_context(ce, ce->guc_id.id)) {
> > +           /* Seal race with concurrent suspend by unrolling */
> > +           spin_lock_irqsave(&ce->guc_state.lock, flags);
> > +           set_context_registered(ce);
> > +           clr_context_destroyed(ce);
> > +           intel_gt_pm_put(gt);
> 
> how sure we are this is not calling unbalanced puts?
alan: in this function (guc_lrc_desc_unpin), the summarized flow is:

        check-status-stuff
        if guc-is-not-disabled, take-pm, change ctx-state-bits
        [implied else] if guc-is-disabled, scub-ctx and return
        
thus derigster_context is only called if we didnt exit, i.e. when 
guc-is-not-disabled, i.e. after the pm was taken.
> could we wrap this in some existent function to make this clear?
alan: yeah - not so readible as it now - let me tweak this function and make it 
cleaner

> 
> > +           spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> > +           return -ENODEV;
> >     }
> >  
> > -   deregister_context(ce, ce->guc_id.id);
> > +   return 0;
> >  }

Reply via email to