Re: [lng-odp] [PATCH] test:linux-generic: run odp_scheduling in process mode

Brian Brooks Thu, 04 Aug 2016 09:18:29 -0700

On 08/04 11:01:09, Bill Fischofer wrote:
> On Thu, Aug 4, 2016 at 10:59 AM, Mike Holmes <[email protected]> wrote:
> 
> >
> >
> > On 4 August 2016 at 11:47, Bill Fischofer <[email protected]>
> > wrote:
> >
> >>
> >> On Thu, Aug 4, 2016 at 10:36 AM, Mike Holmes <[email protected]>
> >> wrote:
> >>
> >>> On my vanilla x86 I don't get any issues, keen to get this in and  have
> >>> CI run it on lots of HW to see what happens, many of the other tests
> >>> completely fail in process mode so we will expose a lot as we add them I
> >>> think.
> >>>
> >>> On 4 August 2016 at 11:33, Bill Fischofer <[email protected]>
> >>> wrote:
> >>>
> >>>>
> >>>>
> >>>> On Thu, Aug 4, 2016 at 10:26 AM, Brian Brooks <[email protected]>
> >>>> wrote:
> >>>>
> >>>>> Reviewed-by: Brian Brooks <[email protected]>
> >>>>>
> >>>>> On 08/04 09:18:14, Mike Holmes wrote:
> >>>>> > +ret=0
> >>>>> > +
> >>>>> > +run()
> >>>>> > +{
> >>>>> > +     echo odp_scheduling_run_proc starts with $1 worker threads
> >>>>> > +     echo =====================================================
> >>>>> > +
> >>>>> > +     $PERFORMANCE/odp_scheduling${EXEEXT} --odph_proc -c $1 ||
> >>>>> ret=1
> >>>>> > +}
> >>>>> > +
> >>>>> > +run 1
> >>>>> > +run 8
> >>>>> > +
> >>>>> > +exit $ret
> >>>>>
> >>>>> Seeing this randomly in both multithread and multiprocess modes:
> >>>>>
> >>>>
> >>>> Before or after you apply this patch? What environment are you seeing
> >>>> these errors in. They should definitely not be happening.
> >>>>
> >>>>
> >>>>>
> >>>>> ../../../odp/platform/linux-generic/odp_queue.c:328:odp_queue_destroy():queue
> >>>>> "sched_00_07" not empty
> >>>>> ../../../odp/platform/linux-generic/odp_schedule.c:271:schedule_term_global():Queue
> >>>>> not empty
> >>>>> ../../../odp/platform/linux-generic/odp_schedule.c:294:schedule_term_global():Pool
> >>>>> destroy fail.
> >>>>> ../../../odp/platform/linux-generic/odp_init.c:188:_odp_term_global():ODP
> >>>>> schedule term failed.
> >>>>> ../../../odp/platform/linux-generic/odp_queue.c:170:odp_queue_term_global():Not
> >>>>> destroyed queue: sched_00_07
> >>>>> ../../../odp/platform/linux-generic/odp_init.c:195:_odp_term_global():ODP
> >>>>> queue term failed.
> >>>>> ../../../odp/platform/linux-generic/odp_pool.c:149:odp_pool_term_global():Not
> >>>>> destroyed pool: odp_sched_pool
> >>>>> ../../../odp/platform/linux-generic/odp_pool.c:149:odp_pool_term_global():Not
> >>>>> destroyed pool: msg_pool
> >>>>> ../../../odp/platform/linux-generic/odp_init.c:202:_odp_term_global():ODP
> >>>>> buffer pool term failed.
> >>>>> ~/odp_incoming/odp_build/test/common_plat/performance$ echo $?
> >>>>> 0
> >>>>>
> >>>>>
> >> Looks like we have a real issue that somehow creeped into master. I can
> >> sporadically reproduce these same errors on my x86 system.  It looks like
> >> this is also present in the monarch_lts branch.
> >>
> >
> >
> > I think that we agreed that Monarch would not support Process mode becasue
> > we never tested for it, but for TgrM we need to start fixing it.
> >
> 
> Unfortunately the issue Brian identified has nothing to do with process
> mode. This happens in regular pthread mode on all levels past v1.10.0.0 as
> far as I can see.


The issue seems to emerge only under high event rates. The application asks
for more work, but none will be scheduled. However, there actually will be
work in the queue. So, the teardown will fail because the queue is not empty.
There may be a disconnect between the scheduling and the queueing or some
other synchronization related bug. I think I've seen something similar on
an ARM platform, so it may be architecture independent.

Re: [lng-odp] [PATCH] test:linux-generic: run odp_scheduling in process mode

Reply via email to