Hey Hal,

Thanks.  You're right about the tweak below.  I'll resubmit the patch.

Al

On Fri, 2014-06-27 at 17:36 -0400, Hal Rosenstock wrote:
> On 6/27/2014 1:56 PM, Albert Chu wrote:
> > Introduce new sweep state PERFMGR_SWEEP_POST_PROCESSING to fix
> > race in perfmgr.
> > 
> > Race occurs as follows:
> 
> Nice find!
> 
> > 
> > Under typical conditions, osm_perfmgr_process() is entered
> > with sweep_state set to PERFMGR_SWEEP_SLEEP.  osm_perfmgr_process()
> > sets sweep_state to PERFMGR_SWEEP_ACTIVE when it begins to sweep.
> > 
> > osm_perfmgr_process() will eventually call perfmgr_send_mad() by
> > way of perfmgr_query_counters() and several other functions.
> > 
> > Responses to performance counter MADs may initiate the sending
> > of more MADs via perfmgr_send_mad(), such as through redirection
> > or the desire to clear counters.
> > 
> > If too many MADs have been put on the wire, perfmgr_send_mad()
> > will throttle sending out MADS and temporarily change sweep_state
> > between PERFMGR_SWEEP_SUSPENDED and PERFMGR_SWEEP_ACTIVE as it
> > throttles.  The sweep_state is set to PERFMGR_SWEEP_ACTIVE
> > when all performance counter MADs have been sent out by the sweeper.
> > 
> > osm_perfmgr_process() eventually completes its sweep and puts
> > sweep_state back into PERFMGR_SWEEP_SLEEP.
> > 
> > At this point, some MADs may still be on the wire.  New MADs may be
> > put back on the wire if responses necessitate it (redirection or
> > clearing counters).  If enough MADs are put back onto the wire,
> > perfmgr_send_mad() will throttle as normal, temporarily moving
> > between PERFMGR_SWEEP_SUSPENDED and PERFMGR_SWEEP_ACTIVE.  After
> > the throttling is complete, sweep_state is put into
> > PERFMGR_SWEEP_ACTIVE state.
> > 
> > This is the key problem, the sweep_state is changed from
> > PERFMGR_SWEEP_SLEEP to PERFMGR_SWEEP_ACTIVE outside of
> > osm_perfmgr_process().
> > 
> > Now that the perfmgr is in ACTIVE state, any future sweep call to
> > osm_perfmgr_process() will not sweep b/c the sweep_state is set
> > to PERFMGR_SWEEP_ACTIVE.
> > 
> > The introduction of a new sweep_state PERFMGR_SWEEP_POST_PROCESSING
> > fixes this problem.
> > 
> > If perfmgr_send_mad() throttles mads while in PERFMGR_SWEEP_SLEEP.
> > sweep_state will be moved into the PERFMGR_SWEEP_POST_PROCESSING
> > state instead of PERFMGR_SWEEP_SUSPENDED/PERFMGR_SWEEP_ACTIVE.
> > 
> > When all post-SLEEP state MAD processing is complete, the sweep_state
> > will move from PERFMGR_SWEEP_POST_PROCESSING back to PERFMGR_SWEEP_SLEEP,
> > so that future sweeps can operate as normal.
> > 
> > Signed-off-by: Albert L. Chu <[email protected]>
> > ---
> >  include/opensm/osm_perfmgr.h |    6 +++++-
> >  opensm/osm_perfmgr.c         |   21 +++++++++++++++++++--
> >  2 files changed, 24 insertions(+), 3 deletions(-)
> > 
> > diff --git a/include/opensm/osm_perfmgr.h b/include/opensm/osm_perfmgr.h
> > index ab02c26..44a278d 100644
> > --- a/include/opensm/osm_perfmgr.h
> > +++ b/include/opensm/osm_perfmgr.h
> > @@ -88,7 +88,8 @@ typedef enum {
> >  typedef enum {
> >     PERFMGR_SWEEP_SLEEP,
> >     PERFMGR_SWEEP_ACTIVE,
> > -   PERFMGR_SWEEP_SUSPENDED
> > +   PERFMGR_SWEEP_SUSPENDED,
> > +   PERFMGR_SWEEP_POST_PROCESSING
> >  } osm_perfmgr_sweep_state_t;
> >  
> >  typedef struct monitored_port {
> > @@ -239,6 +240,9 @@ inline static const char 
> > *osm_perfmgr_get_sweep_state_str(osm_perfmgr_t * perfmg
> >     case PERFMGR_SWEEP_SUSPENDED:
> >             return "Suspended";
> >             break;
> > +   case PERFMGR_SWEEP_POST_PROCESSING:
> > +           return "PostProcessing";
> > +           break;
> >     }
> >     return "UNKNOWN";
> >  }
> > diff --git a/opensm/osm_perfmgr.c b/opensm/osm_perfmgr.c
> > index bfa63dc..863e9bf 100644
> > --- a/opensm/osm_perfmgr.c
> > +++ b/opensm/osm_perfmgr.c
> > @@ -164,6 +164,16 @@ static inline void 
> > decrement_outstanding_queries(osm_perfmgr_t * pm)
> >  {
> >     cl_atomic_dec(&pm->outstanding_queries);
> >     cl_event_signal(&pm->sig_query);
> > +
> > +   if (!pm->outstanding_queries) {
> > +           cl_spinlock_acquire(&pm->lock);
> > +           if (pm->sweep_state == PERFMGR_SWEEP_POST_PROCESSING) {
> > +                   pm->sweep_state = PERFMGR_SWEEP_SLEEP;
> > +                   OSM_LOG(pm->log, OSM_LOG_INFO,
> > +                           "PM sweep state exiting Post Processing\n");
> > +           }
> > +           cl_spinlock_release(&pm->lock);
> > +   }
> 
> Shouldn't this potential sweep state update occur prior to the event
> being signaled ?
> 
> -- Hal
> 
> >  }
> >  
> >  /**********************************************************************
> > @@ -439,7 +449,13 @@ static ib_api_status_t perfmgr_send_mad(osm_perfmgr_t 
> > *perfmgr,
> >             while (perfmgr->outstanding_queries >
> >                    (int32_t)perfmgr->max_outstanding_queries) {
> >                     cl_spinlock_acquire(&perfmgr->lock);
> > -                   perfmgr->sweep_state = PERFMGR_SWEEP_SUSPENDED;
> > +                   if (perfmgr->sweep_state == PERFMGR_SWEEP_SLEEP) {
> > +                           perfmgr->sweep_state = 
> > PERFMGR_SWEEP_POST_PROCESSING;
> > +                           OSM_LOG(perfmgr->log, OSM_LOG_INFO,
> > +                                   "PM sweep state going into Post 
> > Processing\n");
> > +                   }
> > +                   else if (perfmgr->sweep_state == PERFMGR_SWEEP_ACTIVE)
> > +                           perfmgr->sweep_state = PERFMGR_SWEEP_SUSPENDED;
> >                     cl_spinlock_release(&perfmgr->lock);
> >  wait:
> >                     sts = cl_event_wait_on(&perfmgr->sig_query,
> > @@ -1015,7 +1031,8 @@ void osm_perfmgr_process(osm_perfmgr_t * pm)
> >  
> >     cl_spinlock_acquire(&pm->lock);
> >     if (pm->sweep_state == PERFMGR_SWEEP_ACTIVE ||
> > -       pm->sweep_state == PERFMGR_SWEEP_SUSPENDED) {
> > +       pm->sweep_state == PERFMGR_SWEEP_SUSPENDED ||
> > +       pm->sweep_state == PERFMGR_SWEEP_POST_PROCESSING) {
> >             cl_spinlock_release(&pm->lock);
> >             OSM_LOG(pm->log, OSM_LOG_INFO,
> >                     "PM sweep state %d, skipping sweep\n",
> 
-- 
Albert Chu
[email protected]
Computer Scientist
High Performance Systems Division
Lawrence Livermore National Laboratory

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to