> 

> > alan:snip
> > > > @@ -3279,6 +3322,17 @@ static void destroyed_worker_func(struct
> > work_struct *w)
> > > >         struct intel_gt *gt = guc_to_gt(guc);
> > > >         int tmp;
> > > > 
> > > > +       /*
> > > > +        * In rare cases we can get here via async context-free 
> > > > fence-signals
> > that
> > > > +        * come very late in suspend flow or very early in resume 
> > > > flows. In
> > these
> > > > +        * cases, GuC won't be ready but just skipping it here is fine 
> > > > as these
> > > > +        * pending-destroy-contexts get destroyed totally at GuC reset 
> > > > time at
> > the
> > > > +        * end of suspend.. OR.. this worker can be picked up later on 
> > > > the next
> > > > +        * context destruction trigger after resume-completes
> > > 
> > > who is triggering the work queue again?
> > 
> > alan: short answer: we dont know - and still hunting this (getting closer 
> > now..
> > using task tgid str-name lookups).
> > in the few times I've seen it, the callstack I've seen looked like this:
> > 
> > [33763.582036] Call Trace:
> > [33763.582038]  <TASK>
> > [33763.582040]  dump_stack_lvl+0x69/0x97 [33763.582054]
> > guc_context_destroy+0x1b5/0x1ec [33763.582067]
> > free_engines+0x52/0x70 [33763.582072]  rcu_do_batch+0x161/0x438
> > [33763.582084]  rcu_nocb_cb_kthread+0xda/0x2d0 [33763.582093]
> > kthread+0x13a/0x152 [33763.582102]  ?
> > rcu_nocb_gp_kthread+0x6a7/0x6a7 [33763.582107]  ? css_get+0x38/0x38
> > [33763.582118]  ret_from_fork+0x1f/0x30 [33763.582128]  </TASK>

> Alan above trace is not due to missing GT wakeref, it is due to a 
> intel_context_put(),
> Which  called asynchronously by rcu_call(__free_engines), we need insert 
> rcu_barrier() to flush all
> rcu callback in late suspend.
> 
> Thanks,
> Anshuman.
> > 
Thanks Anshuman for following up with the ongoing debug. I shall re-rev 
accordingly.
...alan

Reply via email to