Bill Fischofer(Bill-Fischofer-Linaro) replied on github web page: platform/linux-generic/include/odp_ring_st_internal.h line 46 @@ -0,0 +1,118 @@ +/* Copyright (c) 2018, Linaro Limited + * All rights reserved. + * + * SPDX-License-Identifier: BSD-3-Clause + */ + +#ifndef ODP_RING_ST_INTERNAL_H_ +#define ODP_RING_ST_INTERNAL_H_ + +#ifdef __cplusplus +extern "C" { +#endif + +#include <odp/api/hints.h> +#include <odp_align_internal.h> + +/* Basic ring for single thread usage. Operations must be synchronized by using + * locks (or other means), when multiple threads use the same ring. */ +typedef struct { + uint32_t head; + uint32_t tail; + uint32_t mask; + uint32_t *data; + +} ring_st_t; + +/* Initialize ring. Ring size must be a power of two. */ +static inline void ring_st_init(ring_st_t *ring, uint32_t *data, uint32_t size) +{ + ring->head = 0; + ring->tail = 0; + ring->mask = size - 1; + ring->data = data; +} + +/* Dequeue data from the ring head. Max_num is smaller than ring size.*/ +static inline uint32_t ring_st_deq_multi(ring_st_t *ring, uint32_t data[], + uint32_t max_num) +{ + uint32_t head, tail, mask, idx; + uint32_t num, i; + + head = ring->head; + tail = ring->tail; + mask = ring->mask; + num = tail - head;
Comment: OK > Petri Savolainen(psavol) wrote: > ``` > if (ring_st_is_empty(&queue->s.ring_st)) > > if (ring_st_is_empty(&queue->s.ring_st)) > > if (ring_st_is_empty(&queue->s.ring_st) == 0) > > if (ring_st_is_empty(&queue->s.ring_st)) > > if (ring_st_is_empty(&queue->s.ring_st)) > > if (!ring_st_is_empty(&queue->s.ring_st)) > > if (ring_st_is_empty(&queue->s.ring_st)) > > if (ring_st_is_empty(&queue->s.ring_st)) > ``` > > Find if clause for "ring is not empty" from picture above. >> Petri Savolainen(psavol) wrote: >> It's unsigned 32 bit arithmetic, not signed integer arithmetic. Also, tail >> is always ahead of head, but no more than queue size (e.g. 4k), So, e.g. >> when tail is 0, and head is 0xffffffff the number of items in ring is 0 - >> 0xffffffff = 1 >> >> ``` >> head tail >> hex [index], hex [index], num, num_free >> 0xfffffffe [6], 0xffffffff [7], 1, 7 >> 0xffffffff [7], 0x00000000 [0], 1, 7 >> 0x00000000 [0], 0x00000001 [1], 1, 7 >> ``` >>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>> Please see above discussion. The comment stands. >>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>> Not true. See the comment below where it's clear that `head` and `tail` >>>> are free running and thus will eventually wrap as described above. If >>>> these were `uint64_t` variables you could argue that the wraps would take >>>> centuries and hence are nothing to worry about, but being `uint32_t` >>>> variables this should be expected to happen regularly on heavily used >>>> queues. >>>> >>>> `abs()` adds no overhead since compilers treat it as an intrinsic. >>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>> Note that neither `head` nor `tail` get masked when doing enqueues or >>>>> dequeues, so they will traverse the entire 32-bit range of the variable >>>>> over time. While the ring itself will never hold more than `size` >>>>> entries, `head` and `tail` will wrap around. When you index into the ring >>>>> you do so via a mask (`idx = head & mask;`), but calculating the number >>>>> of elements in the ring doesn't do this, which is why `abs()` is needed. >>>>>> Petri Savolainen(psavol) wrote: >>>>>> Plus this is done only once - in pool create phase. >>>>>>> Petri Savolainen(psavol) wrote: >>>>>>> No since ring size will be much smaller than 4 billion. >>>>>>>> Petri Savolainen(psavol) wrote: >>>>>>>> The point is that ring size will never be close to 4 billion entries. >>>>>>>> E.g. currently tail is always max 4096 larger than head. Your example >>>>>>>> above is based assumption of 4 billion entry ring. Overflow is avoided >>>>>>>> when ring size if <2 billion, as 32 bit indexes can be still used to >>>>>>>> calculate number of items correctly. >>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>> Agreed. >>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>> Agreed, this is good for now. Later we may wish to honor the >>>>>>>>>> user-requested queue `size` parameter. >>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>> Agreed. >>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>> Correct, as noted earlier. I withdraw that comment. >>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>> Presumably the compiler will see the overlap and optimize away >>>>>>>>>>>>> the redundancy, so I assume the performance impact will be nil >>>>>>>>>>>>> here. >>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>>> Since you're allowing `head` and `tail` to run over the entire >>>>>>>>>>>>>> 32-bit range, you're correct that you can completely fill the >>>>>>>>>>>>>> ring. I did miss that point. However, as shown above you still >>>>>>>>>>>>>> need this to be: >>>>>>>>>>>>>> >>>>>>>>>>>>>> ``` >>>>>>>>>>>>>> num = size - abs(tail - head); >>>>>>>>>>>>>> ``` >>>>>>>>>>>>>> To avoid problems at the 32-bit wrap boundary. >>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>>>> You're computing in 32 bits not 8 bits, and your ring size is >>>>>>>>>>>>>>> less than 2^32 elements. Consider the following test program: >>>>>>>>>>>>>>> ``` >>>>>>>>>>>>>>> #include <stdlib.h> >>>>>>>>>>>>>>> #include <stdio.h> >>>>>>>>>>>>>>> #include <stdint.h> >>>>>>>>>>>>>>> #include <inttypes.h> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> void main() { >>>>>>>>>>>>>>> uint32_t head[4] = {0, 1, 2, 3}; >>>>>>>>>>>>>>> uint32_t tail[4] = {0, 0xffffffff, 0xfffffffe, >>>>>>>>>>>>>>> 0xfffffffd}; >>>>>>>>>>>>>>> uint32_t mask = 4095; >>>>>>>>>>>>>>> uint32_t result; >>>>>>>>>>>>>>> int i; >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> for (i = 0; i < 4; i++) { >>>>>>>>>>>>>>> printf("head = %" PRIu32 " tail = %" PRIu32 >>>>>>>>>>>>>>> ":\n", >>>>>>>>>>>>>>> head[i], tail[i]); >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> result = tail[i] - head[i]; >>>>>>>>>>>>>>> printf("tail - head = %" PRIu32 "\n", result); >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> result = (tail[i] - head[i]) & mask; >>>>>>>>>>>>>>> printf("(tail - head) & mask = %" PRIu32 "\n", >>>>>>>>>>>>>>> result); >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> result = abs(tail[i] - head[i]); >>>>>>>>>>>>>>> printf("abs(tail - head) = %" PRIu32 "\n\n", >>>>>>>>>>>>>>> result); >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> ``` >>>>>>>>>>>>>>> in theory `tail - head` should be the number of elements in the >>>>>>>>>>>>>>> ring, in this case 0, 2, 4, and 6. But running this test >>>>>>>>>>>>>>> program gives the following output: >>>>>>>>>>>>>>> ``` >>>>>>>>>>>>>>> head = 0 tail = 0: >>>>>>>>>>>>>>> tail - head = 0 >>>>>>>>>>>>>>> (tail - head) & mask = 0 >>>>>>>>>>>>>>> abs(tail - head) = 0 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> head = 1 tail = 4294967295: >>>>>>>>>>>>>>> tail - head = 4294967294 >>>>>>>>>>>>>>> (tail - head) & mask = 4094 >>>>>>>>>>>>>>> abs(tail - head) = 2 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> head = 2 tail = 4294967294: >>>>>>>>>>>>>>> tail - head = 4294967292 >>>>>>>>>>>>>>> (tail - head) & mask = 4092 >>>>>>>>>>>>>>> abs(tail - head) = 4 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> head = 3 tail = 4294967293: >>>>>>>>>>>>>>> tail - head = 4294967290 >>>>>>>>>>>>>>> (tail - head) & mask = 4090 >>>>>>>>>>>>>>> abs(tail - head) = 6 >>>>>>>>>>>>>>> ``` >>>>>>>>>>>>>>> Since you're allowing head to run free over the 32-bit range of >>>>>>>>>>>>>>> the variable, when the 32-bits rolls over you'll get a large >>>>>>>>>>>>>>> positive number, not the small one you need to stay within the >>>>>>>>>>>>>>> ring bounds. The alternative is to mask `head` and `tail` as >>>>>>>>>>>>>>> you increment them, but then you run into the effective range >>>>>>>>>>>>>>> issue. >>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>>>>> OK >>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>>>>>> OK, the `ODP_ASSERT()` would still be useful for debugging. >>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>>>>>>> That functionality is not obvious from the name. It either >>>>>>>>>>>>>>>>>> implies that one of the input arguments is written (not true >>>>>>>>>>>>>>>>>> here) or the reader might assume that it is an expression >>>>>>>>>>>>>>>>>> without side-effect and should be deleted (what I originally >>>>>>>>>>>>>>>>>> thought when reading it). You should pick a routine name >>>>>>>>>>>>>>>>>> that makes it clear it's actually doing something real, in >>>>>>>>>>>>>>>>>> this case performing prefetch processing. >>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>>>>>>>> Elsewhere you write `if (ring_st_is_empty(...))`, not `if >>>>>>>>>>>>>>>>>>> (ring_st_is_empty(...) == 1)` so this is inconsistent. >>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote: >>>>>>>>>>>>>>>>>>>> Didn't try larger than 32. 32 is already quite large from >>>>>>>>>>>>>>>>>>>> QoS point of view. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I'm planning to use config file for run time tunning, so >>>>>>>>>>>>>>>>>>>> this hard coding may change in that phase. >>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote: >>>>>>>>>>>>>>>>>>>>> One entry is not lost. >>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote: >>>>>>>>>>>>>>>>>>>>>> One entry is not lost. User provided size if not >>>>>>>>>>>>>>>>>>>>>> (currently) used. Queue size is always 4k. >>>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote: >>>>>>>>>>>>>>>>>>>>>>> One entry is not lost. >>>>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote: >>>>>>>>>>>>>>>>>>>>>>>> One entry is not lost. >>>>>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote: >>>>>>>>>>>>>>>>>>>>>>>>> OK, added checks in v2. >>>>>>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote: >>>>>>>>>>>>>>>>>>>>>>>>>> OK. Compiler probably did that already, but changed >>>>>>>>>>>>>>>>>>>>>>>>>> in v2. >>>>>>>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>> Tail and head indexes are (masked from) uint32_t >>>>>>>>>>>>>>>>>>>>>>>>>>> and do not wrap around when the ring is full. I >>>>>>>>>>>>>>>>>>>>>>>>>>> think you assume that the store index is >>>>>>>>>>>>>>>>>>>>>>>>>>> 0...size-1, while it's full uint32_t which is then >>>>>>>>>>>>>>>>>>>>>>>>>>> masked to get the actual index. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> For example: >>>>>>>>>>>>>>>>>>>>>>>>>>> size = 100; >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Empty: >>>>>>>>>>>>>>>>>>>>>>>>>>> head = 100 >>>>>>>>>>>>>>>>>>>>>>>>>>> tail = 100 >>>>>>>>>>>>>>>>>>>>>>>>>>> num = 100 - 100 = 0 >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Full: >>>>>>>>>>>>>>>>>>>>>>>>>>> head = 100 >>>>>>>>>>>>>>>>>>>>>>>>>>> tail = 200 >>>>>>>>>>>>>>>>>>>>>>>>>>> num = 200 - 100 = 100 >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Wrap uint32_t + full: >>>>>>>>>>>>>>>>>>>>>>>>>>> head = 0xFFFFFF9C >>>>>>>>>>>>>>>>>>>>>>>>>>> tail = 0 >>>>>>>>>>>>>>>>>>>>>>>>>>> num = 0 - 0xFFFFFF9C = 0x64 = 100 >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> So, no abs() needed. Ring size can be 4096, instead >>>>>>>>>>>>>>>>>>>>>>>>>>> of 4095. >>>>>>>>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>> It's already documented 5 lines above: >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> /* Initialize ring. Ring size must be a power of >>>>>>>>>>>>>>>>>>>>>>>>>>>> two. */ >>>>>>>>>>>>>>>>>>>>>>>>>>>> static inline void ring_st_init(ring_st_t *ring, >>>>>>>>>>>>>>>>>>>>>>>>>>>> uint32_t *data, uint32_t size) >>>>>>>>>>>>>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>> This function converts 32 bit buffer indexes to >>>>>>>>>>>>>>>>>>>>>>>>>>>>> buffer header pointers. The counter operation is >>>>>>>>>>>>>>>>>>>>>>>>>>>>> buffer_index_from_buf(). The prefetch is a side >>>>>>>>>>>>>>>>>>>>>>>>>>>>> effect of the function, which may be >>>>>>>>>>>>>>>>>>>>>>>>>>>>> changed/moved any time if it's found out that >>>>>>>>>>>>>>>>>>>>>>>>>>>>> there's a place for prefetching. I actually plan >>>>>>>>>>>>>>>>>>>>>>>>>>>>> to test if number of prefetches should be limited >>>>>>>>>>>>>>>>>>>>>>>>>>>>> as e.g. 32 consecutive prefetches may be too much >>>>>>>>>>>>>>>>>>>>>>>>>>>>> for some CPU architectures. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I prefer style where '== 0' is used instead of >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '!'. Especially, when the if clause is as >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> complex as this and there's danger for reader to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> miss the '!' sign. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It's there to ensure that all bits are zero >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> also when someone would modify the bitfield >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> from two to three fields later on. Similarly to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> memset() zero is used for struct inits. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> There's no need for abs(). Since it's all >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> uint32_t variables, wrap a round is handled >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> already. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> An example in 8bits: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0xff - 0xfd = 0x02 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x00 - 0xfe = 0x02 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x01 - 0xff = 0x02 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x02 - 0x00 = 0x02 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This passes both gcc and clang, and is used >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> already in the other ring implementation see >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ring_deq_multi(). >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I prefer style with blank line in the end of >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> a typedef, since it's easier to spot the type >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> name (as it's not mixed into struct field >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> names). Checkpatch passes so this style >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should be OK. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Does this mean that sizes larger than 32 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> have no added performance benefit? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Must use `CONFIG_QUEUE_SIZE - 1` here, as >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> noted earlier, if we're not going to use >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the user-supplied queue size. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Given its name, this looks like an >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> extraneous statement that should be >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> deleted. Renaming this to something like >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `prefetch_dequeued_bufs()` would make the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> intent clearer here. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `if >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (!ring_st_is_empty(&queue->s.ring_st))` >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> seems more natural here. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Change to `if (param->size >= >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> CONFIG_QUEUE_SIZE)` to handle the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> effective queue capacity. The >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> user-supplied `size` should then be set >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to `ROUNDUP_POWER2_U32(size) - 1` for >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the masking to work properly. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Same comment here as for plain queues. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> As noted earlier, due to "losing" one >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> entry to distinguish queue empty/full, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this should be returned as >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `CONFIG_QUEUE_SIZE - 1`, and we also >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> need to ensure that >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `CONFIG_QUEUE_SIZE` is itself a power >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of 2. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Since you're initializing >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `index.pool` and `index.buffer` >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> there's no need to set `index.u32` >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> here. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Fischofer(Bill-Fischofer-Linaro) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We originally had this index >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> partitioning based on >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `ODP_CONFIG_POOLS`. Do we want to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> return to that here? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If not, we at least need an >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `ODP_STATIC_ASSERT()` to ensure that >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `ODP_CONFIG_POOLS < 256` or else bad >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> things will happen here. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Fischofer(Bill-Fischofer-Linaro) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This routine can be optimized to: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ``` >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> return ring->head == ring->tail; >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ``` >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Fischofer(Bill-Fischofer-Linaro) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Your invariant is the queue is >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> empty when `head == tail` >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> therefore the queue is full when >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `abs(tail - head) == mask`, so the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> correct calculation here is: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `num = mask - abs(tail - head);` >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The effect is that a queue can >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> only hold `size - 1` elements, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> otherwise you cannot distinguish >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> between a full and an empty queue >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> without another bit of metadata, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> which is a cost you're trying to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> avoid. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This is somewhat problematic if >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the caller is trying to be >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "optimal" by specifying a power of >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> two in the `size` parameter of the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `odp_queue_param_t` passed to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `odp_queue_create()`. For this >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> reason we may wish to return a >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `max_size` of a power of 2 - 1 in >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `odp_queue_capability()` as part >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of this patch series. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Fischofer(Bill-Fischofer-Linaro) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This only works if `size` is a >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> power of 2. Should be documented >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> as such, since this is an >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> internal routine. In this case an >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `ODP_ASSERT(size == >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ROUNDUP_POWER2_U32(size))` for >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this requirement would be a >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> useful debugging aid. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Fischofer(Bill-Fischofer-Linaro) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should be `num = abs(tail - >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> head);` to deal with wrap >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> arounds, otherwise may be >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> misinterpreted as overly large >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> since it's `uint32_t`. Note that >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> GCC and clang recognize `abs()` >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and treat it as a builtin, so >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> there's no actual `stdlib.h` >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> call here. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Fischofer(Bill-Fischofer-Linaro) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Extra blank line should be >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> removed (nit). https://github.com/Linaro/odp/pull/492#discussion_r170581894 updated_at 2018-02-26 12:56:13