* Paul E. McKenney (paul...@linux.vnet.ibm.com) wrote:
> [Sorry for the delay, finally getting back to this.]
> 
> On Mon, Dec 17, 2012 at 09:40:09AM -0500, Mathieu Desnoyers wrote:
> > * Paul E. McKenney (paul...@linux.vnet.ibm.com) wrote:
> > > On Thu, Dec 13, 2012 at 06:44:56AM -0500, Mathieu Desnoyers wrote:
> > > > I noticed that in addition to having:
> > > > 
> > > > - push/enqueue returning whether the stack/queue was empty prior to the
> > > >   operation,
> > > > - pop_all/splice, by nature, emptying the stack/queue,
> > > > 
> > > > it can be interesting to make pop/dequeue operations return whether they
> > > > are returning the last element of the stack/queue (therefore emptying
> > > > it). This allow extending the test-cases covering the number of empty
> > > > stack/queue encountered by both push/enqueuer and pop/dequeuer threads
> > > > not only to push/enqueue paired with pop_all/splice, but also to
> > > > pop/dequeue.
> > > > 
> > > > In the case of wfstack, this unfortunately requires to modify an already
> > > > exposed API. As a RFC, one question we should answer is how we want to
> > > > handle the way forward: should we add new functions to the wfstack API
> > > > and leave the existing ones alone ? 
> > > > 
> > > > Thoughts ?
> > > 
> > > Hmmm...  What is the use case, given that a push might happen immediately
> > > after the pop said that the stack/queue was empty?  Of course, if we
> > > somehow know that there are no concurrent pushes, we could instead
> > > check for empty.
> > > 
> > > So what am I missing here?
> > 
> > The setup for those use-cases is the following (I'm using the stack as
> > example, but the same applies to queue):
> > 
> > - we have N threads doing push and using the push return value that
> >   states whether it pushed into an empty stack.
> > - we have M threads doing "pop", using the return value to know if it
> >   pops a stack into an empty-stack-state. Following the locking
> >   requirements, we protect those M threads'pop by a mutex, but they
> >   don't need to be protected against push.
> > 
> > Just to help understanding where the idea comes from, let's start with
> > another use-case that is similar (push/pop_all). Knowing whether we
> > pushed into an empty stack along with pop_all become very useful when
> > you want to combine the stack with a higher level batching semantic
> > linked to the elements present within the stack.
> > 
> > In the case of grace period batching, for instance, I used
> > "push"/"pop_all" to provide this kind of semantic: if we push into an
> > empty stack, we know we will have to go through the grace period. If we
> > are pushed into a non-empty stack, we just wait to be awakened by the
> > first thread which was pushed into the stack. This requires that we use
> > "pop_all" before going though the grace period.
> > 
> > Now more specifically about "pop", one use-case I have in mind is
> > energy-efficient handling of empty stacks. With M threads executing
> > "pop", let's suppose we want them to be blocked on a futex when there is
> > nothing to do. Now the tricky part is: how can we do this without adding
> > overhead (extra load/stores) to the stack ?
> > 
> > If we have the ability to know whether we are popping the last element
> > of a stack, we can use this information to go into a futex wait state
> > after having handled the last element. Since the threads doing "push"
> > would monitor whether they push into an empty stack, they would wake us
> > whenever needed.
> > 
> > If instead we choose to simply wait until one of the M threads discovers
> > that the stack is actually empty, we are issuing extra "pop" (which
> > fails) each time the stack is empty. In the worse-case, if a queue
> > always flip between 0 and 1 elements, we double the number of "pop"
> > needed to handle the same amount of nodes.
> > 
> > Otherwise, if we choose to add an explicit check to see whether the
> > stack is empty, we are adding an extra load of the head node for every
> > pop.
> > 
> > Another use-case I see is low-overhead monitoring of stack usage
> > efficiency. For this kind of use-case, we might want to know, both
> > within push and pop threads, if we are underutilizing our system
> > resources. Having the ability to know that we are reaching empty state
> > without any extra overhead to stack memory traffic gives us this
> > ability.
> > 
> > I must admit that the use-cases for returning whether pop takes the last
> > element is not as strong as the batching case with push/pop_all, mainly
> > because AFAIU, we can achieve the same result by doing an extra check of
> > stack emptiness state (either by an explicit empty() check, or by
> > issuing an extra pop that will see an empty stack). What we are saving
> > here is the extra overhead on stack cache-lines cause by this extra
> > check.
> > 
> > Another use-case, although maybe less compelling, is for validation.
> > With concurrent threads doing push/pop/pop_all operations on the stack,
> > we can perform the following check: If we empty the stack at the end of
> > test execution, the
> > 
> >   number of push-to-empty-stack
> > 
> >       must be equal to the
> > 
> >   number of pop_all-from-non-empty-stack
> >    + number of pop-last-element-from-non-empty-stack
> > 
> > We should note that this validation could not be performed if "pop" is
> > not returning whether it popped the last stack element (checked
> > atomically with the pop operation). This is a use-case where adding an
> > extra check on the pop-side would not work (it needs to be performed
> > atomically with pop).
> > 
> > And maybe there are other use-cases that are currently beyond my
> > imagination too.
> > 
> > Thoughts ?
> 
> Sounds like a job for a separate API member that can be added when
> needed.  I do admit that you have legitimate use cases, but I do not
> believe that they will be the common case.

Agreed, this won't be the common case.

I'll propose new API members for this.

Thanks,

Mathieu

> 
> Maybe due to failure of imagination on my part, but...  ;-)
> 
>                                                       Thanx, Paul
> 
> > Thanks,
> > 
> > Mathieu
> > 
> > 
> > > 
> > >                                                   Thanx, Paul
> > > 
> > > > Thanks,
> > > > 
> > > > Mathieu
> > > > 
> > > > ---
> > > > diff --git a/tests/test_urcu_wfcq.c b/tests/test_urcu_wfcq.c
> > > > index 91285a5..de9566d 100644
> > > > --- a/tests/test_urcu_wfcq.c
> > > > +++ b/tests/test_urcu_wfcq.c
> > > > @@ -168,6 +168,7 @@ static DEFINE_URCU_TLS(unsigned long long, 
> > > > nr_successful_dequeues);
> > > >  static DEFINE_URCU_TLS(unsigned long long, nr_successful_enqueues);
> > > >  static DEFINE_URCU_TLS(unsigned long long, nr_empty_dest_enqueues);
> > > >  static DEFINE_URCU_TLS(unsigned long long, nr_splice);
> > > > +static DEFINE_URCU_TLS(unsigned long long, nr_dequeue_last);
> > > > 
> > > >  static unsigned int nr_enqueuers;
> > > >  static unsigned int nr_dequeuers;
> > > > @@ -228,11 +229,15 @@ fail:
> > > >  static void do_test_dequeue(enum test_sync sync)
> > > >  {
> > > >         struct cds_wfcq_node *node;
> > > > +       bool last;
> > > > 
> > > >         if (sync == TEST_SYNC_MUTEX)
> > > > -               node = cds_wfcq_dequeue_blocking(&head, &tail);
> > > > +               node = cds_wfcq_dequeue_blocking(&head, &tail, &last);
> > > >         else
> > > > -               node = __cds_wfcq_dequeue_blocking(&head, &tail);
> > > > +               node = __cds_wfcq_dequeue_blocking(&head, &tail, &last);
> > > > +
> > > > +       if (last)
> > > > +               URCU_TLS(nr_dequeue_last)++;
> > > > 
> > > >         if (node) {
> > > >                 free(node);
> > > > @@ -263,6 +268,7 @@ static void do_test_splice(enum test_sync sync)
> > > >                 break;
> > > >         case CDS_WFCQ_RET_DEST_EMPTY:
> > > >                 URCU_TLS(nr_splice)++;
> > > > +               URCU_TLS(nr_dequeue_last)++;
> > > >                 /* ok */
> > > >                 break;
> > > >         case CDS_WFCQ_RET_DEST_NON_EMPTY:
> > > > @@ -325,16 +331,21 @@ static void *thr_dequeuer(void *_count)
> > > >         count[0] = URCU_TLS(nr_dequeues);
> > > >         count[1] = URCU_TLS(nr_successful_dequeues);
> > > >         count[2] = URCU_TLS(nr_splice);
> > > > +       count[3] = URCU_TLS(nr_dequeue_last);
> > > >         return ((void*)2);
> > > >  }
> > > > 
> > > > -static void test_end(unsigned long long *nr_dequeues)
> > > > +static void test_end(unsigned long long *nr_dequeues,
> > > > +               unsigned long long *nr_dequeue_last)
> > > >  {
> > > >         struct cds_wfcq_node *node;
> > > > +       bool last;
> > > > 
> > > >         do {
> > > > -               node = cds_wfcq_dequeue_blocking(&head, &tail);
> > > > +               node = cds_wfcq_dequeue_blocking(&head, &tail, &last);
> > > >                 if (node) {
> > > > +                       if (last)
> > > > +                               (*nr_dequeue_last)++;
> > > >                         free(node);
> > > >                         (*nr_dequeues)++;
> > > >                 }
> > > > @@ -367,7 +378,7 @@ int main(int argc, char **argv)
> > > >         unsigned long long tot_successful_enqueues = 0,
> > > >                            tot_successful_dequeues = 0,
> > > >                            tot_empty_dest_enqueues = 0,
> > > > -                          tot_splice = 0;
> > > > +                          tot_splice = 0, tot_dequeue_last = 0;
> > > >         unsigned long long end_dequeues = 0;
> > > >         int i, a, retval = 0;
> > > > 
> > > > @@ -480,7 +491,7 @@ int main(int argc, char **argv)
> > > >         tid_enqueuer = malloc(sizeof(*tid_enqueuer) * nr_enqueuers);
> > > >         tid_dequeuer = malloc(sizeof(*tid_dequeuer) * nr_dequeuers);
> > > >         count_enqueuer = malloc(3 * sizeof(*count_enqueuer) * 
> > > > nr_enqueuers);
> > > > -       count_dequeuer = malloc(3 * sizeof(*count_dequeuer) * 
> > > > nr_dequeuers);
> > > > +       count_dequeuer = malloc(4 * sizeof(*count_dequeuer) * 
> > > > nr_dequeuers);
> > > >         cds_wfcq_init(&head, &tail);
> > > > 
> > > >         next_aff = 0;
> > > > @@ -493,7 +504,7 @@ int main(int argc, char **argv)
> > > >         }
> > > >         for (i = 0; i < nr_dequeuers; i++) {
> > > >                 err = pthread_create(&tid_dequeuer[i], NULL, 
> > > > thr_dequeuer,
> > > > -                                    &count_dequeuer[3 * i]);
> > > > +                                    &count_dequeuer[4 * i]);
> > > >                 if (err != 0)
> > > >                         exit(1);
> > > >         }
> > > > @@ -533,34 +544,37 @@ int main(int argc, char **argv)
> > > >                 err = pthread_join(tid_dequeuer[i], &tret);
> > > >                 if (err != 0)
> > > >                         exit(1);
> > > > -               tot_dequeues += count_dequeuer[3 * i];
> > > > -               tot_successful_dequeues += count_dequeuer[3 * i + 1];
> > > > -               tot_splice += count_dequeuer[3 * i + 2];
> > > > +               tot_dequeues += count_dequeuer[4 * i];
> > > > +               tot_successful_dequeues += count_dequeuer[4 * i + 1];
> > > > +               tot_splice += count_dequeuer[4 * i + 2];
> > > > +               tot_dequeue_last += count_dequeuer[4 * i + 3];
> > > >         }
> > > >         
> > > > -       test_end(&end_dequeues);
> > > > +       test_end(&end_dequeues, &tot_dequeue_last);
> > > > 
> > > >         printf_verbose("total number of enqueues : %llu, dequeues 
> > > > %llu\n",
> > > >                        tot_enqueues, tot_dequeues);
> > > >         printf_verbose("total number of successful enqueues : %llu, "
> > > >                        "enqueues to empty dest : %llu, "
> > > >                        "successful dequeues %llu, "
> > > > -                      "splice : %llu\n",
> > > > +                      "splice : %llu, dequeue_last : %llu\n",
> > > >                        tot_successful_enqueues,
> > > >                        tot_empty_dest_enqueues,
> > > >                        tot_successful_dequeues,
> > > > -                      tot_splice);
> > > > +                      tot_splice, tot_dequeue_last);
> > > >         printf("SUMMARY %-25s testdur %4lu nr_enqueuers %3u wdelay %6lu 
> > > > "
> > > >                 "nr_dequeuers %3u "
> > > >                 "rdur %6lu nr_enqueues %12llu nr_dequeues %12llu "
> > > >                 "successful enqueues %12llu enqueues to empty dest 
> > > > %12llu "
> > > >                 "successful dequeues %12llu splice %12llu "
> > > > +               "dequeue_last %llu "
> > > >                 "end_dequeues %llu nr_ops %12llu\n",
> > > >                 argv[0], duration, nr_enqueuers, wdelay,
> > > >                 nr_dequeuers, rduration, tot_enqueues, tot_dequeues,
> > > >                 tot_successful_enqueues,
> > > >                 tot_empty_dest_enqueues,
> > > > -               tot_successful_dequeues, tot_splice, end_dequeues,
> > > > +               tot_successful_dequeues, tot_splice, tot_dequeue_last,
> > > > +               end_dequeues,
> > > >                 tot_enqueues + tot_dequeues);
> > > > 
> > > >         if (tot_successful_enqueues != tot_successful_dequeues + 
> > > > end_dequeues) {
> > > > @@ -576,12 +590,11 @@ int main(int argc, char **argv)
> > > >          * exactly as many empty queues than the number of non-empty
> > > >          * src splice.
> > > >          */
> > > > -       if (test_wait_empty && test_splice && !test_dequeue
> > > > -                       && tot_empty_dest_enqueues != tot_splice) {
> > > > +       if (tot_empty_dest_enqueues != tot_dequeue_last) {
> > > >                 printf("WARNING! Discrepancy between empty enqueue 
> > > > (%llu) and "
> > > > -                       "number of non-empty splice (%llu)\n",
> > > > +                       "number of dequeue of last element (%llu)\n",
> > > >                         tot_empty_dest_enqueues,
> > > > -                       tot_splice);
> > > > +                       tot_dequeue_last);
> > > >                 retval = 1;
> > > >         }
> > > >         free(count_enqueuer);
> > > > diff --git a/tests/test_urcu_wfs.c b/tests/test_urcu_wfs.c
> > > > index 259ca24..6c54153 100644
> > > > --- a/tests/test_urcu_wfs.c
> > > > +++ b/tests/test_urcu_wfs.c
> > > > @@ -171,6 +171,7 @@ static DEFINE_URCU_TLS(unsigned long long, 
> > > > nr_successful_dequeues);
> > > >  static DEFINE_URCU_TLS(unsigned long long, nr_successful_enqueues);
> > > >  static DEFINE_URCU_TLS(unsigned long long, nr_empty_dest_enqueues);
> > > >  static DEFINE_URCU_TLS(unsigned long long, nr_pop_all);
> > > > +static DEFINE_URCU_TLS(unsigned long long, nr_pop_last);
> > > > 
> > > >  static unsigned int nr_enqueuers;
> > > >  static unsigned int nr_dequeuers;
> > > > @@ -230,14 +231,17 @@ fail:
> > > >  static void do_test_pop(enum test_sync sync)
> > > >  {
> > > >         struct cds_wfs_node *node;
> > > > +       bool last;
> > > > 
> > > >         if (sync == TEST_SYNC_MUTEX)
> > > >                 cds_wfs_pop_lock(&s);
> > > > -       node = __cds_wfs_pop_blocking(&s);
> > > > +       node = __cds_wfs_pop_blocking(&s, &last);
> > > >         if (sync == TEST_SYNC_MUTEX)
> > > >                 cds_wfs_pop_unlock(&s);
> > > > 
> > > >         if (node) {
> > > > +               if (last)
> > > > +                       URCU_TLS(nr_pop_last)++;
> > > >                 free(node);
> > > >                 URCU_TLS(nr_successful_dequeues)++;
> > > >         }
> > > > @@ -260,6 +264,7 @@ static void do_test_pop_all(enum test_sync sync)
> > > >                 return;
> > > > 
> > > >         URCU_TLS(nr_pop_all)++;
> > > > +       URCU_TLS(nr_pop_last)++;
> > > > 
> > > >         cds_wfs_for_each_blocking_safe(head, node, n) {
> > > >                 free(node);
> > > > @@ -308,24 +313,30 @@ static void *thr_dequeuer(void *_count)
> > > > 
> > > >         printf_verbose("dequeuer thread_end, thread id : %lx, tid %lu, "
> > > >                        "dequeues %llu, successful_dequeues %llu "
> > > > -                      "pop_all %llu\n",
> > > > +                      "pop_all %llu pop_last %llu\n",
> > > >                        pthread_self(),
> > > >                         (unsigned long) gettid(),
> > > >                        URCU_TLS(nr_dequeues), 
> > > > URCU_TLS(nr_successful_dequeues),
> > > > -                      URCU_TLS(nr_pop_all));
> > > > +                      URCU_TLS(nr_pop_all),
> > > > +                      URCU_TLS(nr_pop_last));
> > > >         count[0] = URCU_TLS(nr_dequeues);
> > > >         count[1] = URCU_TLS(nr_successful_dequeues);
> > > >         count[2] = URCU_TLS(nr_pop_all);
> > > > +       count[3] = URCU_TLS(nr_pop_last);
> > > >         return ((void*)2);
> > > >  }
> > > > 
> > > > -static void test_end(struct cds_wfs_stack *s, unsigned long long 
> > > > *nr_dequeues)
> > > > +static void test_end(struct cds_wfs_stack *s, unsigned long long 
> > > > *nr_dequeues,
> > > > +               unsigned long long *nr_pop_last)
> > > >  {
> > > >         struct cds_wfs_node *node;
> > > > +       bool last;
> > > > 
> > > >         do {
> > > > -               node = cds_wfs_pop_blocking(s);
> > > > +               node = cds_wfs_pop_blocking(s, &last);
> > > >                 if (node) {
> > > > +                       if (last)
> > > > +                               (*nr_pop_last)++;
> > > >                         free(node);
> > > >                         (*nr_dequeues)++;
> > > >                 }
> > > > @@ -358,7 +369,7 @@ int main(int argc, char **argv)
> > > >         unsigned long long tot_successful_enqueues = 0,
> > > >                            tot_successful_dequeues = 0,
> > > >                            tot_empty_dest_enqueues = 0,
> > > > -                          tot_pop_all = 0;
> > > > +                          tot_pop_all = 0, tot_pop_last = 0;
> > > >         unsigned long long end_dequeues = 0;
> > > >         int i, a, retval = 0;
> > > > 
> > > > @@ -471,7 +482,7 @@ int main(int argc, char **argv)
> > > >         tid_enqueuer = malloc(sizeof(*tid_enqueuer) * nr_enqueuers);
> > > >         tid_dequeuer = malloc(sizeof(*tid_dequeuer) * nr_dequeuers);
> > > >         count_enqueuer = malloc(3 * sizeof(*count_enqueuer) * 
> > > > nr_enqueuers);
> > > > -       count_dequeuer = malloc(3 * sizeof(*count_dequeuer) * 
> > > > nr_dequeuers);
> > > > +       count_dequeuer = malloc(4 * sizeof(*count_dequeuer) * 
> > > > nr_dequeuers);
> > > >         cds_wfs_init(&s);
> > > > 
> > > >         next_aff = 0;
> > > > @@ -484,7 +495,7 @@ int main(int argc, char **argv)
> > > >         }
> > > >         for (i = 0; i < nr_dequeuers; i++) {
> > > >                 err = pthread_create(&tid_dequeuer[i], NULL, 
> > > > thr_dequeuer,
> > > > -                                    &count_dequeuer[3 * i]);
> > > > +                                    &count_dequeuer[4 * i]);
> > > >                 if (err != 0)
> > > >                         exit(1);
> > > >         }
> > > > @@ -524,34 +535,36 @@ int main(int argc, char **argv)
> > > >                 err = pthread_join(tid_dequeuer[i], &tret);
> > > >                 if (err != 0)
> > > >                         exit(1);
> > > > -               tot_dequeues += count_dequeuer[3 * i];
> > > > -               tot_successful_dequeues += count_dequeuer[3 * i + 1];
> > > > -               tot_pop_all += count_dequeuer[3 * i + 2];
> > > > +               tot_dequeues += count_dequeuer[4 * i];
> > > > +               tot_successful_dequeues += count_dequeuer[4 * i + 1];
> > > > +               tot_pop_all += count_dequeuer[4 * i + 2];
> > > > +               tot_pop_last += count_dequeuer[4 * i + 3];
> > > >         }
> > > >         
> > > > -       test_end(&s, &end_dequeues);
> > > > +       test_end(&s, &end_dequeues, &tot_pop_last);
> > > > 
> > > >         printf_verbose("total number of enqueues : %llu, dequeues 
> > > > %llu\n",
> > > >                        tot_enqueues, tot_dequeues);
> > > >         printf_verbose("total number of successful enqueues : %llu, "
> > > >                        "enqueues to empty dest : %llu, "
> > > >                        "successful dequeues %llu, "
> > > > -                      "pop_all : %llu\n",
> > > > +                      "pop_all : %llu, pop_last : %llu\n",
> > > >                        tot_successful_enqueues,
> > > >                        tot_empty_dest_enqueues,
> > > >                        tot_successful_dequeues,
> > > > -                      tot_pop_all);
> > > > +                      tot_pop_all, tot_pop_last);
> > > >         printf("SUMMARY %-25s testdur %4lu nr_enqueuers %3u wdelay %6lu 
> > > > "
> > > >                 "nr_dequeuers %3u "
> > > >                 "rdur %6lu nr_enqueues %12llu nr_dequeues %12llu "
> > > >                 "successful enqueues %12llu enqueues to empty dest 
> > > > %12llu "
> > > >                 "successful dequeues %12llu pop_all %12llu "
> > > > -               "end_dequeues %llu nr_ops %12llu\n",
> > > > +               "pop_last %llu end_dequeues %llu nr_ops %12llu\n",
> > > >                 argv[0], duration, nr_enqueuers, wdelay,
> > > >                 nr_dequeuers, rduration, tot_enqueues, tot_dequeues,
> > > >                 tot_successful_enqueues,
> > > >                 tot_empty_dest_enqueues,
> > > > -               tot_successful_dequeues, tot_pop_all, end_dequeues,
> > > > +               tot_successful_dequeues, tot_pop_all, tot_pop_last,
> > > > +               end_dequeues,
> > > >                 tot_enqueues + tot_dequeues);
> > > >         if (tot_successful_enqueues != tot_successful_dequeues + 
> > > > end_dequeues) {
> > > >                 printf("WARNING! Discrepancy between nr succ. enqueues 
> > > > %llu vs "
> > > > @@ -561,16 +574,14 @@ int main(int argc, char **argv)
> > > >                 retval = 1;
> > > >         }
> > > >         /*
> > > > -        * If only using pop_all to dequeue, the enqueuer should see
> > > > -        * exactly as many empty queues than the number of non-empty
> > > > -        * stacks dequeued.
> > > > +        * The enqueuer should see exactly as many empty queues than the
> > > > +        * number of non-empty stacks dequeued.
> > > >          */
> > > > -       if (test_wait_empty && test_pop_all && !test_pop
> > > > -                       && tot_empty_dest_enqueues != tot_pop_all) {
> > > > +       if (tot_empty_dest_enqueues != tot_pop_last) {
> > > >                 printf("WARNING! Discrepancy between empty enqueue 
> > > > (%llu) and "
> > > > -                       "number of non-empty pop_all (%llu)\n",
> > > > +                       "number of pop last (%llu)\n",
> > > >                         tot_empty_dest_enqueues,
> > > > -                       tot_pop_all);
> > > > +                       tot_pop_last);
> > > >                 retval = 1;
> > > >         }
> > > >         free(count_enqueuer);
> > > > diff --git a/urcu/static/wfcqueue.h b/urcu/static/wfcqueue.h
> > > > index 4b3535a..33c99ed 100644
> > > > --- a/urcu/static/wfcqueue.h
> > > > +++ b/urcu/static/wfcqueue.h
> > > > @@ -352,16 +352,23 @@ ___cds_wfcq_next_nonblocking(struct cds_wfcq_head 
> > > > *head,
> > > >  static inline struct cds_wfcq_node *
> > > >  ___cds_wfcq_dequeue(struct cds_wfcq_head *head,
> > > >                 struct cds_wfcq_tail *tail,
> > > > +               bool *last,
> > > >                 int blocking)
> > > >  {
> > > >         struct cds_wfcq_node *node, *next;
> > > > 
> > > > -       if (_cds_wfcq_empty(head, tail))
> > > > +       if (_cds_wfcq_empty(head, tail)) {
> > > > +               if (last)
> > > > +                       *last = 0;
> > > >                 return NULL;
> > > > +       }
> > > > 
> > > >         node = ___cds_wfcq_node_sync_next(&head->node, blocking);
> > > > -       if (!blocking && node == CDS_WFCQ_WOULDBLOCK)
> > > > +       if (!blocking && node == CDS_WFCQ_WOULDBLOCK) {
> > > > +               if (last)
> > > > +                       *last = 0;
> > > >                 return CDS_WFCQ_WOULDBLOCK;
> > > > +       }
> > > > 
> > > >         if ((next = CMM_LOAD_SHARED(node->next)) == NULL) {
> > > >                 /*
> > > > @@ -379,8 +386,11 @@ ___cds_wfcq_dequeue(struct cds_wfcq_head *head,
> > > >                  * content.
> > > >                  */
> > > >                 _cds_wfcq_node_init(&head->node);
> > > > -               if (uatomic_cmpxchg(&tail->p, node, &head->node) == 
> > > > node)
> > > > +               if (uatomic_cmpxchg(&tail->p, node, &head->node) == 
> > > > node) {
> > > > +                       if (last)
> > > > +                               *last = 1;
> > > >                         return node;
> > > > +               }
> > > >                 next = ___cds_wfcq_node_sync_next(node, blocking);
> > > >                 /*
> > > >                  * In nonblocking mode, if we would need to block to
> > > > @@ -389,6 +399,8 @@ ___cds_wfcq_dequeue(struct cds_wfcq_head *head,
> > > >                  */
> > > >                 if (!blocking && next == CDS_WFCQ_WOULDBLOCK) {
> > > >                         head->node.next = node;
> > > > +                       if (last)
> > > > +                               *last = 0;
> > > >                         return CDS_WFCQ_WOULDBLOCK;
> > > >                 }
> > > >         }
> > > > @@ -400,6 +412,8 @@ ___cds_wfcq_dequeue(struct cds_wfcq_head *head,
> > > > 
> > > >         /* Load q->head.next before loading node's content */
> > > >         cmm_smp_read_barrier_depends();
> > > > +       if (last)
> > > > +               *last = 0;
> > > >         return node;
> > > >  }
> > > > 
> > > > @@ -414,9 +428,9 @@ ___cds_wfcq_dequeue(struct cds_wfcq_head *head,
> > > >   */
> > > >  static inline struct cds_wfcq_node *
> > > >  ___cds_wfcq_dequeue_blocking(struct cds_wfcq_head *head,
> > > > -               struct cds_wfcq_tail *tail)
> > > > +               struct cds_wfcq_tail *tail, bool *last)
> > > >  {
> > > > -       return ___cds_wfcq_dequeue(head, tail, 1);
> > > > +       return ___cds_wfcq_dequeue(head, tail, last, 1);
> > > >  }
> > > > 
> > > >  /*
> > > > @@ -427,9 +441,9 @@ ___cds_wfcq_dequeue_blocking(struct cds_wfcq_head 
> > > > *head,
> > > >   */
> > > >  static inline struct cds_wfcq_node *
> > > >  ___cds_wfcq_dequeue_nonblocking(struct cds_wfcq_head *head,
> > > > -               struct cds_wfcq_tail *tail)
> > > > +               struct cds_wfcq_tail *tail, bool *last)
> > > >  {
> > > > -       return ___cds_wfcq_dequeue(head, tail, 0);
> > > > +       return ___cds_wfcq_dequeue(head, tail, last, 0);
> > > >  }
> > > > 
> > > >  /*
> > > > @@ -542,12 +556,12 @@ ___cds_wfcq_splice_nonblocking(
> > > >   */
> > > >  static inline struct cds_wfcq_node *
> > > >  _cds_wfcq_dequeue_blocking(struct cds_wfcq_head *head,
> > > > -               struct cds_wfcq_tail *tail)
> > > > +               struct cds_wfcq_tail *tail, bool *last)
> > > >  {
> > > >         struct cds_wfcq_node *retval;
> > > > 
> > > >         _cds_wfcq_dequeue_lock(head, tail);
> > > > -       retval = ___cds_wfcq_dequeue_blocking(head, tail);
> > > > +       retval = ___cds_wfcq_dequeue_blocking(head, tail, last);
> > > >         _cds_wfcq_dequeue_unlock(head, tail);
> > > >         return retval;
> > > >  }
> > > > diff --git a/urcu/static/wfstack.h b/urcu/static/wfstack.h
> > > > index 9bc9519..2ebda27 100644
> > > > --- a/urcu/static/wfstack.h
> > > > +++ b/urcu/static/wfstack.h
> > > > @@ -161,23 +161,35 @@ ___cds_wfs_node_sync_next(struct cds_wfs_node 
> > > > *node, int blocking)
> > > > 
> > > >  static inline
> > > >  struct cds_wfs_node *
> > > > -___cds_wfs_pop(struct cds_wfs_stack *s, int blocking)
> > > > +___cds_wfs_pop(struct cds_wfs_stack *s, bool *last, int blocking)
> > > >  {
> > > >         struct cds_wfs_head *head, *new_head;
> > > >         struct cds_wfs_node *next;
> > > > 
> > > >         for (;;) {
> > > >                 head = CMM_LOAD_SHARED(s->head);
> > > > -               if (___cds_wfs_end(head))
> > > > +               if (___cds_wfs_end(head)) {
> > > > +                       if (last)
> > > > +                               *last = 0;
> > > >                         return NULL;
> > > > +               }
> > > >                 next = ___cds_wfs_node_sync_next(&head->node, blocking);
> > > > -               if (!blocking && next == CDS_WFS_WOULDBLOCK)
> > > > +               if (!blocking && next == CDS_WFS_WOULDBLOCK) {
> > > > +                       if (last)
> > > > +                               *last = 0;
> > > >                         return CDS_WFS_WOULDBLOCK;
> > > > +               }
> > > >                 new_head = caa_container_of(next, struct cds_wfs_head, 
> > > > node);
> > > > -               if (uatomic_cmpxchg(&s->head, head, new_head) == head)
> > > > +               if (uatomic_cmpxchg(&s->head, head, new_head) == head) {
> > > > +                       if (last)
> > > > +                               *last = ___cds_wfs_end(new_head);
> > > >                         return &head->node;
> > > > -               if (!blocking)
> > > > +               }
> > > > +               if (!blocking) {
> > > > +                       if (last)
> > > > +                               *last = 0;
> > > >                         return CDS_WFS_WOULDBLOCK;
> > > > +               }
> > > >                 /* busy-loop if head changed under us */
> > > >         }
> > > >  }
> > > > @@ -200,9 +212,9 @@ ___cds_wfs_pop(struct cds_wfs_stack *s, int 
> > > > blocking)
> > > >   */
> > > >  static inline
> > > >  struct cds_wfs_node *
> > > > -___cds_wfs_pop_blocking(struct cds_wfs_stack *s)
> > > > +___cds_wfs_pop_blocking(struct cds_wfs_stack *s, bool *last)
> > > >  {
> > > > -       return ___cds_wfs_pop(s, 1);
> > > > +       return ___cds_wfs_pop(s, last, 1);
> > > >  }
> > > > 
> > > >  /*
> > > > @@ -213,9 +225,9 @@ ___cds_wfs_pop_blocking(struct cds_wfs_stack *s)
> > > >   */
> > > >  static inline
> > > >  struct cds_wfs_node *
> > > > -___cds_wfs_pop_nonblocking(struct cds_wfs_stack *s)
> > > > +___cds_wfs_pop_nonblocking(struct cds_wfs_stack *s, bool *last)
> > > >  {
> > > > -       return ___cds_wfs_pop(s, 0);
> > > > +       return ___cds_wfs_pop(s, last, 0);
> > > >  }
> > > > 
> > > >  /*
> > > > @@ -284,12 +296,12 @@ static inline void _cds_wfs_pop_unlock(struct 
> > > > cds_wfs_stack *s)
> > > >   */
> > > >  static inline
> > > >  struct cds_wfs_node *
> > > > -_cds_wfs_pop_blocking(struct cds_wfs_stack *s)
> > > > +_cds_wfs_pop_blocking(struct cds_wfs_stack *s, bool *last)
> > > >  {
> > > >         struct cds_wfs_node *retnode;
> > > > 
> > > >         _cds_wfs_pop_lock(s);
> > > > -       retnode = ___cds_wfs_pop_blocking(s);
> > > > +       retnode = ___cds_wfs_pop_blocking(s, last);
> > > >         _cds_wfs_pop_unlock(s);
> > > >         return retnode;
> > > >  }
> > > > diff --git a/urcu/wfcqueue.h b/urcu/wfcqueue.h
> > > > index b6be9f3..4b9e73b 100644
> > > > --- a/urcu/wfcqueue.h
> > > > +++ b/urcu/wfcqueue.h
> > > > @@ -197,7 +197,8 @@ extern bool cds_wfcq_enqueue(struct cds_wfcq_head 
> > > > *head,
> > > >   */
> > > >  extern struct cds_wfcq_node *cds_wfcq_dequeue_blocking(
> > > >                 struct cds_wfcq_head *head,
> > > > -               struct cds_wfcq_tail *tail);
> > > > +               struct cds_wfcq_tail *tail,
> > > > +               bool *last);
> > > > 
> > > >  /*
> > > >   * cds_wfcq_splice_blocking: enqueue all src_q nodes at the end of 
> > > > dest_q.
> > > > @@ -229,7 +230,8 @@ extern enum cds_wfcq_ret cds_wfcq_splice_blocking(
> > > >   */
> > > >  extern struct cds_wfcq_node *__cds_wfcq_dequeue_blocking(
> > > >                 struct cds_wfcq_head *head,
> > > > -               struct cds_wfcq_tail *tail);
> > > > +               struct cds_wfcq_tail *tail,
> > > > +               bool *last);
> > > > 
> > > >  /*
> > > >   * __cds_wfcq_dequeue_nonblocking: dequeue a node from a wait-free 
> > > > queue.
> > > > @@ -239,7 +241,8 @@ extern struct cds_wfcq_node 
> > > > *__cds_wfcq_dequeue_blocking(
> > > >   */
> > > >  extern struct cds_wfcq_node *__cds_wfcq_dequeue_nonblocking(
> > > >                 struct cds_wfcq_head *head,
> > > > -               struct cds_wfcq_tail *tail);
> > > > +               struct cds_wfcq_tail *tail,
> > > > +               bool *last);
> > > > 
> > > >  /*
> > > >   * __cds_wfcq_splice_blocking: enqueue all src_q nodes at the end of 
> > > > dest_q.
> > > > diff --git a/urcu/wfstack.h b/urcu/wfstack.h
> > > > index 03fee8f..1e4b848 100644
> > > > --- a/urcu/wfstack.h
> > > > +++ b/urcu/wfstack.h
> > > > @@ -147,7 +147,8 @@ extern int cds_wfs_push(struct cds_wfs_stack *s, 
> > > > struct cds_wfs_node *node);
> > > >   *
> > > >   * Calls __cds_wfs_pop_blocking with an internal pop mutex held.
> > > >   */
> > > > -extern struct cds_wfs_node *cds_wfs_pop_blocking(struct cds_wfs_stack 
> > > > *s);
> > > > +extern struct cds_wfs_node *cds_wfs_pop_blocking(struct cds_wfs_stack 
> > > > *s,
> > > > +               bool *last);
> > > > 
> > > >  /*
> > > >   * cds_wfs_pop_all_blocking: pop all nodes from a stack.
> > > > @@ -219,7 +220,8 @@ extern void cds_wfs_pop_unlock(struct cds_wfs_stack 
> > > > *s);
> > > >   * 3) Ensuring that only ONE thread can call __cds_wfs_pop_blocking()
> > > >   *    and __cds_wfs_pop_all(). (multi-provider/single-consumer scheme).
> > > >   */
> > > > -extern struct cds_wfs_node *__cds_wfs_pop_blocking(struct 
> > > > cds_wfs_stack *s);
> > > > +extern struct cds_wfs_node *__cds_wfs_pop_blocking(struct 
> > > > cds_wfs_stack *s,
> > > > +               bool *last);
> > > > 
> > > >  /*
> > > >   * __cds_wfs_pop_nonblocking: pop a node from the stack.
> > > > @@ -227,7 +229,8 @@ extern struct cds_wfs_node 
> > > > *__cds_wfs_pop_blocking(struct cds_wfs_stack *s);
> > > >   * Same as __cds_wfs_pop_blocking, but returns CDS_WFS_WOULDBLOCK if
> > > >   * it needs to block.
> > > >   */
> > > > -extern struct cds_wfs_node *__cds_wfs_pop_nonblocking(struct 
> > > > cds_wfs_stack *s);
> > > > +extern struct cds_wfs_node *__cds_wfs_pop_nonblocking(struct 
> > > > cds_wfs_stack *s,
> > > > +               bool *last);
> > > > 
> > > >  /*
> > > >   * __cds_wfs_pop_all: pop all nodes from a stack.
> > > > diff --git a/wfcqueue.c b/wfcqueue.c
> > > > index ab0eb93..7baefdf 100644
> > > > --- a/wfcqueue.c
> > > > +++ b/wfcqueue.c
> > > > @@ -68,9 +68,10 @@ void cds_wfcq_dequeue_unlock(struct cds_wfcq_head 
> > > > *head,
> > > > 
> > > >  struct cds_wfcq_node *cds_wfcq_dequeue_blocking(
> > > >                 struct cds_wfcq_head *head,
> > > > -               struct cds_wfcq_tail *tail)
> > > > +               struct cds_wfcq_tail *tail,
> > > > +               bool *last)
> > > >  {
> > > > -       return _cds_wfcq_dequeue_blocking(head, tail);
> > > > +       return _cds_wfcq_dequeue_blocking(head, tail, last);
> > > >  }
> > > > 
> > > >  enum cds_wfcq_ret cds_wfcq_splice_blocking(
> > > > @@ -85,16 +86,18 @@ enum cds_wfcq_ret cds_wfcq_splice_blocking(
> > > > 
> > > >  struct cds_wfcq_node *__cds_wfcq_dequeue_blocking(
> > > >                 struct cds_wfcq_head *head,
> > > > -               struct cds_wfcq_tail *tail)
> > > > +               struct cds_wfcq_tail *tail,
> > > > +               bool *last)
> > > >  {
> > > > -       return ___cds_wfcq_dequeue_blocking(head, tail);
> > > > +       return ___cds_wfcq_dequeue_blocking(head, tail, last);
> > > >  }
> > > > 
> > > >  struct cds_wfcq_node *__cds_wfcq_dequeue_nonblocking(
> > > >                 struct cds_wfcq_head *head,
> > > > -               struct cds_wfcq_tail *tail)
> > > > +               struct cds_wfcq_tail *tail,
> > > > +               bool *last)
> > > >  {
> > > > -       return ___cds_wfcq_dequeue_nonblocking(head, tail);
> > > > +       return ___cds_wfcq_dequeue_nonblocking(head, tail, last);
> > > >  }
> > > > 
> > > >  enum cds_wfcq_ret __cds_wfcq_splice_blocking(
> > > > diff --git a/wfstack.c b/wfstack.c
> > > > index 4ccb6b9..041703b 100644
> > > > --- a/wfstack.c
> > > > +++ b/wfstack.c
> > > > @@ -48,9 +48,10 @@ int cds_wfs_push(struct cds_wfs_stack *s, struct 
> > > > cds_wfs_node *node)
> > > >         return _cds_wfs_push(s, node);
> > > >  }
> > > > 
> > > > -struct cds_wfs_node *cds_wfs_pop_blocking(struct cds_wfs_stack *s)
> > > > +struct cds_wfs_node *cds_wfs_pop_blocking(struct cds_wfs_stack *s,
> > > > +               bool *last)
> > > >  {
> > > > -       return _cds_wfs_pop_blocking(s);
> > > > +       return _cds_wfs_pop_blocking(s, last);
> > > >  }
> > > > 
> > > >  struct cds_wfs_head *cds_wfs_pop_all_blocking(struct cds_wfs_stack *s)
> > > > @@ -83,14 +84,16 @@ void cds_wfs_pop_unlock(struct cds_wfs_stack *s)
> > > >         _cds_wfs_pop_unlock(s);
> > > >  }
> > > > 
> > > > -struct cds_wfs_node *__cds_wfs_pop_blocking(struct cds_wfs_stack *s)
> > > > +struct cds_wfs_node *__cds_wfs_pop_blocking(struct cds_wfs_stack *s,
> > > > +               bool *last)
> > > >  {
> > > > -       return ___cds_wfs_pop_blocking(s);
> > > > +       return ___cds_wfs_pop_blocking(s, last);
> > > >  }
> > > > 
> > > > -struct cds_wfs_node *__cds_wfs_pop_nonblocking(struct cds_wfs_stack *s)
> > > > +struct cds_wfs_node *__cds_wfs_pop_nonblocking(struct cds_wfs_stack *s,
> > > > +               bool *last)
> > > >  {
> > > > -       return ___cds_wfs_pop_nonblocking(s);
> > > > +       return ___cds_wfs_pop_nonblocking(s, last);
> > > >  }
> > > > 
> > > >  struct cds_wfs_head *__cds_wfs_pop_all(struct cds_wfs_stack *s)
> > > > 
> > > > -- 
> > > > Mathieu Desnoyers
> > > > Operating System Efficiency R&D Consultant
> > > > EfficiOS Inc.
> > > > http://www.efficios.com
> > > > 
> > > 
> > 
> > -- 
> > Mathieu Desnoyers
> > Operating System Efficiency R&D Consultant
> > EfficiOS Inc.
> > http://www.efficios.com
> > 
> > _______________________________________________
> > rp mailing list
> > r...@svcs.cs.pdx.edu
> > http://svcs.cs.pdx.edu/mailman/listinfo/rp
> > 
> 

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

Reply via email to