Re: [lng-odp] [PATCH v2 1/2] test: perf: add new scheduling latency test

Bill Fischofer Mon, 19 Sep 2016 13:42:13 -0700

On Mon, Sep 19, 2016 at 2:11 PM, Brian Brooks <brian.bro...@linaro.org>
wrote:


> On 09/19 07:55:22, Elo, Matias (Nokia - FI/Espoo) wrote:
> > >
> > > On 09/14 11:53:06, Matias Elo wrote:
> > > > +
> > > > + /* Clear possible locally stored buffers */
> > > > + odp_schedule_pause();
> > > > +
> > > > + while (1) {
> > > > +         ev = odp_schedule(&src_queue, ODP_SCHED_NO_WAIT);
> > > > +
> > > > +         if (ev == ODP_EVENT_INVALID)
> > > > +                 break;
> > > > +
> > > > +         if (odp_queue_enq(src_queue, ev)) {
> > > > +                 LOG_ERR("[%i] Queue enqueue failed.\n", thr);
> > > > +                 odp_event_free(ev);
> > > > +                 return -1;
> > > > +         }
> > > > + }
> > > > +
> > > > + odp_schedule_resume();
> > >
> > > Is it possible to skip this and go straight to draining the queues?
> > >
> > > Locally pre-scheduled work is an implementation detail that should be
> hidden
> > > by the scheduling APIs.
> > >
> > > A hardware scheduler may not pre-schedule work to cores the way the
> current
> > > software implementation does.
> >
> > Also some HW schedulers may operate in push mode and do local cashing.
> Calling
> > odp_schedule_pause() is the only ODP method to signal the scheduler to
> stop this.
> > So to keep the application platform agnostic (and follow the API
> documentation),
> > this step cannot be skipped.
> >
> > -Matias
>
> Thinking in the general sense..
>
> Should applications have to reason about _and_ code around pre-scheduled
> and non-scheduled events? If the event hasn't crossed the API boundary to
> be
> delivered to the application according to the scheduling group policies for
> that core, what is the difference to the application?
>
> If a scheduler implementation uses TLS to pre-schedule events it also seems
> like it should be able to support work-stealing of those pre-scheduled
> events
> by other threads in the runtime case where odp_schedule() is not called
> from
> that thread or the thread id is removed from scheduling group masks. From
> the application perspective these are all implementation details.
>

You're making an argument I made some time back. :)  As I recall, the
rationale for pause/resume was to make life easier for existing code that
is introducing ODP on a more gradual basis. Presumably Nokia has examples
of such code in house.

>From a design standpoint worker threads shouldn't "change their minds" and
go off to do something else for a while. For whatever else they might want
to do it would seem that such requirements would be better served by simply
having another thread to do the other things that wakes up periodically to
do them.


>
> This pause state may also cause some confusion for application writers
> because
> it is now possible to write two different event loops for the same core
> depending on how a particular scheduler implementation behaves. The
> semantics
> seem to blur a bit with scheduling groups. Level of abstraction can be
> raised
> by deprecating the scheduler pause state and APIs.
>

This is a worthwhile discussion to have. I'll add it to the agenda for
tomorrow's ODP call and we can include it in the wider scheduler
discussions scheduled for next week. The other rationale for not wanting
this behavior (another argument I advanced earlier) is that it greatly
complicates recovery processing. A robustly designed application should be
able to recover from the failure of an individual thread (this is
especially true if the ODP thread is in fact a separate process). If the
implementation has prescheduled events to a failed thread then how are they
recovered gracefully? Conversely, if the implementation can recover from
such a scenario than it would seem it could equally "unschedule" prestaged
events as needed due to thread termination (normal or abnormal) or for load
balancing purposes.

We may not be able to fully deprecate these APIs, but perhaps we can make
it clearer how they are intended to be used and classify them as
"discouraged" for new code.


>
> > > The ODP implementation for that environment
> > > would have to turn the scheduling call into a nop for that core if it
> is
> > > paused by use of these APIs. Another way to implement it would be to
> remove
> > > this core from all queue scheduling groups and leave the schedule call
> as-is.
> > > If implemented by the first method, the application writer could
> simply just
> > > not call the API to schedule work. If implemented by the second
> method, there
> > > are already scheduling group APIs to do this.
> >
> > The ODP implementation is free to choose how it implements these calls.
> For
> > example adding a single 'if (odp_unlikely(x))' to odp_schedule() to make
> it a NOP
> > after odp_schedule_pause() has been called shouldn't cause a significant
> overhead.
> >
> > >
> > > Are odp_schedule_pause() and odp_schedule_resume() deprecated?
> >
> > Nope.
> >
> > >
> > > > + odp_barrier_wait(&globals->barrier);
> > > > +
> > > > + clear_sched_queues();
>

Re: [lng-odp] [PATCH v2 1/2] test: perf: add new scheduling latency test

Reply via email to