Re: [lng-odp] [PATCH] test: odp_sched_latency: robust draining of queues

Bill Fischofer Mon, 24 Apr 2017 16:21:23 -0700

On Mon, Apr 24, 2017 at 5:26 PM, Ola Liljedahl <ola.liljed...@arm.com>
wrote:


> (Responding from PoC Outlook)
>
> From:  Bill Fischofer <bill.fischo...@linaro.org>
> Date:  Tuesday, 25 April 2017 at 00:00
> To:  Brian Brooks <brian.bro...@arm.com>
> Cc:  lng-odp-forward <lng-odp@lists.linaro.org>, Ola Liljedahl
> <ola.liljed...@arm.com>
> Subject:  Re: [lng-odp] [PATCH] test: odp_sched_latency: robust draining
> of queues
>
>
>
>
> On Mon, Apr 24, 2017 at 3:58 PM, Brian Brooks
> <brian.bro...@arm.com> wrote:
>
> From: Ola Liljedahl <ola.liljed...@arm.com>
>
> In order to robustly drain all queues when the benchmark has
> ended, we enqueue a special event on every queue and invoke
> the scheduler until all such events have been received.
>
> Signed-off-by: Ola Liljedahl <ola.liljed...@arm.com>
> Reviewed-by: Brian Brooks <brian.bro...@arm.com>
> ---
>  test/common_plat/performance/odp_sched_latency.c | 51
> ++++++++++++++++++------
>  1 file changed, 38 insertions(+), 13 deletions(-)
>
> diff --git a/test/common_plat/performance/odp_sched_latency.c
> b/test/common_plat/performance/odp_sched_latency.c
> index 2b28cd7b..7ba99fc6 100644
> --- a/test/common_plat/performance/odp_sched_latency.c
> +++ b/test/common_plat/performance/odp_sched_latency.c
> @@ -57,9 +57,10 @@ ODP_STATIC_ASSERT(LO_PRIO_QUEUES <= MAX_QUEUES, "Too
> many LO priority queues");
>
>  /** Test event types */
>  typedef enum {
> -       WARM_UP, /**< Warm up event */
> -       TRAFFIC, /**< Event used only as traffic load */
> -       SAMPLE   /**< Event used to measure latency */
> +       WARM_UP,  /**< Warm up event */
> +       COOL_DOWN,/**< Last event on queue */
> +       TRAFFIC,  /**< Event used only as traffic load */
> +       SAMPLE    /**< Event used to measure latency */
>  } event_type_t;
>
>  /** Test event */
> @@ -114,16 +115,40 @@ typedef struct {
>   *
>   * Retry to be sure that all buffers have been scheduled.
>   */
> -static void clear_sched_queues(void)
> +static void clear_sched_queues(test_globals_t *globals)
>  {
>         odp_event_t ev;
> +       odp_buffer_t buf;
> +       test_event_t *event;
> +       int i, j;
> +       uint32_t numtogo = 0;
>
>
>
> int numtogo would be more standard here, and consistent with i, j
> immediately above.
> [Ola] Generally I prefer unsigneds for variables that should never be
> negative. Is there a good reason for using signed ints instead?
>
> By that logic i and j should be unsigned as well, but you always see them
> as int. int is usual because it's easier to type and just as performant for
> these sort of counter-type operations.
>
>
> -       while (1) {
> -               ev = odp_schedule(NULL, ODP_SCHED_NO_WAIT);
> -
> -               if (ev == ODP_EVENT_INVALID)
> -                       break;
> -
> +       /* Enqueue special cool_down events on all queues (one per queue)
> */
> +       for (i = 0; i < NUM_PRIOS; i++) {
> +               for (j = 0; j < globals->args.prio[i].queues; j++) {
> +                       buf = odp_buffer_alloc(globals->pool);
> +                       if (buf == ODP_BUFFER_INVALID) {
> +                               LOG_ERR("Buffer alloc failed.\n");
> +                               return;
> +                       }
>
>
>
> This isn't terribly robust. In the unlikely event that this call fails
> you're leaving a bunch of events (including the previously successfully
> allocated COOL_DOWN markers) on the queues. At minimum you should just
> break out of this "marking" phase and
>  proceed to drain the remaining queues for the numgoto markers already on
> them.
> [Ola] What does partial draining achieve?
>

It avoids leaking the markers that were allocated before we ran out of
buffer space for whatever reason. The return here isn't terminating the
application just telling the caller that the queues have been cleared,
which in this case they have not. Since the queues are being reused for the
next round of processing, the upper layer of the application would need to
deal with one of these latent markers showing up, which would get messy.
I'd be happy with an abort() here if that's preferable.


>
>  More robust would be to preallocate all the markers needed up front or
> else do each queue individually to avoid having to buffer the markers.
> [Ola] Using only one marker event and cycling through the queues while
> draining them is an interesting idea. I might code this and see how it
> looks.
>
> [Ola] AFAIK, this is robust in the absence of errors.
> You are raising the bar. The code is full of ³return -1² if some
> unexpected error occurs. Sometimes some known resource (e.g. event) is
> freed. I don¹t think this program is guaranteed to clean up all resources
> if an error occurs. Even if this was the goal, actually verifying this
> would be cumbersome, achieving full code coverage for all error-related
> code paths. The used coding style also makes visual inspection difficult.
>
> Personally I am happy if (all) resources are freed for successful
> executions but don¹t think it is necessary to spend that much effort on
> cleaning up after unexpected errors (which probably increases the risk of
> new errors, trying to do too much when the system is already in an
> unexpected and potentially inconsistent state is asking for trouble),
> better to just die quickly. You don¹t make systems more robust by adding
> complexity.
>

All the more reason to simply abort() here, which would be fine since this
is just a test.


>
>
>
> +                       event = odp_buffer_addr(buf);
> +                       event->type = COOL_DOWN;
> +                       ev = odp_buffer_to_event(buf);
> +                       if (odp_queue_enq(globals->queue[i][j], ev)) {
> +                               LOG_ERR("Queue enqueue failed.\n");
> +                               odp_event_free(ev);
> +                               return;
> +                       }
> +                       numtogo++;
> +               }
> +       }
> +       /* Invoke scheduler until all cool_down events have been received
> */
> +       while (numtogo != 0) {
>
>
>
> while (numtogo > 0) would be more standard here.
> [Ola] I don¹t want to silently hide bugs by writing code which ³fixes²
> other errors in the code (e.g. by terminating also for ³numtogo < 0" we
> hide potential bugs that makes the numtogo negative even though this
> should not be possible).
>
>
If something happened to make numtogo negative (or hugely positive in the
case of unsigned) how would that be better? The standard (since Dijkstra)
method of guaranteed loop termination is to test for var > 0 against a
monotonically decrementing control var.


>
>
>
> +               ev = odp_schedule(NULL, ODP_SCHED_WAIT);
> +               buf = odp_buffer_from_event(ev);
> +               event = odp_buffer_addr(buf);
> +               if (event->type == COOL_DOWN)
> +                       numtogo--;
>                 odp_event_free(ev);
>         }
>  }
> @@ -394,10 +419,10 @@ static int test_schedule(int thr, test_globals_t
> *globals)
>
>         odp_barrier_wait(&globals->barrier);
>
> -       clear_sched_queues();
> -
> -       if (thr == MAIN_THREAD)
> +       if (thr == MAIN_THREAD) {
> +               clear_sched_queues(globals);
>                 print_results(globals);
> +       }
>
>         return 0;
>  }
> --
> 2.12.2
>
>
>
>
>
>
>

Re: [lng-odp] [PATCH] test: odp_sched_latency: robust draining of queues

Reply via email to