Bill Fischofer(Bill-Fischofer-Linaro) replied on github web page: platform/linux-generic/include/odp_ring_st_internal.h line 62 @@ -0,0 +1,109 @@ +/* Copyright (c) 2018, Linaro Limited + * All rights reserved. + * + * SPDX-License-Identifier: BSD-3-Clause + */ + +#ifndef ODP_RING_ST_INTERNAL_H_ +#define ODP_RING_ST_INTERNAL_H_ + +#ifdef __cplusplus +extern "C" { +#endif + +#include <odp/api/hints.h> +#include <odp_align_internal.h> + +/* Basic ring for single thread usage. Operations must be synchronized by using + * locks (or other means), when multiple threads use the same ring. */ +typedef struct { + uint32_t head; + uint32_t tail; + uint32_t mask; + uint32_t *data; + +} ring_st_t; + +/* Initialize ring. Ring size must be a power of two. */ +static inline void ring_st_init(ring_st_t *ring, uint32_t *data, uint32_t size) +{ + ring->head = 0; + ring->tail = 0; + ring->mask = size - 1; + ring->data = data; +} + +/* Dequeue data from the ring head. Max_num is smaller than ring size.*/ +static inline uint32_t ring_st_deq_multi(ring_st_t *ring, uint32_t data[], + uint32_t max_num) +{ + uint32_t head, tail, mask, idx; + uint32_t num, i; + + head = ring->head; + tail = ring->tail; + mask = ring->mask; + num = tail - head; + + /* Empty */ + if (num == 0) + return 0; + + if (num > max_num) + num = max_num; + + idx = head & mask; + + for (i = 0; i < num; i++) { + data[i] = ring->data[idx]; + idx = (idx + 1) & mask; + } + + ring->head = head + num;
Comment: Note that neither `head` nor `tail` get masked when doing enqueues or dequeues, so they will traverse the entire 32-bit range of the variable over time. While the ring itself will never hold more than `size` entries, `head` and `tail` will wrap around. When you index into the ring you do so via a mask (`idx = head & mask;`), but calculating the number of elements in the ring doesn't do this, which is why `abs()` is needed. > Petri Savolainen(psavol) wrote: > Plus this is done only once - in pool create phase. >> Petri Savolainen(psavol) wrote: >> No since ring size will be much smaller than 4 billion. >>> Petri Savolainen(psavol) wrote: >>> The point is that ring size will never be close to 4 billion entries. E.g. >>> currently tail is always max 4096 larger than head. Your example above is >>> based assumption of 4 billion entry ring. Overflow is avoided when ring >>> size if <2 billion, as 32 bit indexes can be still used to calculate number >>> of items correctly. >>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>> Agreed. >>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>> Agreed, this is good for now. Later we may wish to honor the >>>>> user-requested queue `size` parameter. >>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>> Agreed. >>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>> Correct, as noted earlier. I withdraw that comment. >>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>> Presumably the compiler will see the overlap and optimize away the >>>>>>>> redundancy, so I assume the performance impact will be nil here. >>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>> Since you're allowing `head` and `tail` to run over the entire 32-bit >>>>>>>>> range, you're correct that you can completely fill the ring. I did >>>>>>>>> miss that point. However, as shown above you still need this to be: >>>>>>>>> >>>>>>>>> ``` >>>>>>>>> num = size - abs(tail - head); >>>>>>>>> ``` >>>>>>>>> To avoid problems at the 32-bit wrap boundary. >>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>> You're computing in 32 bits not 8 bits, and your ring size is less >>>>>>>>>> than 2^32 elements. Consider the following test program: >>>>>>>>>> ``` >>>>>>>>>> #include <stdlib.h> >>>>>>>>>> #include <stdio.h> >>>>>>>>>> #include <stdint.h> >>>>>>>>>> #include <inttypes.h> >>>>>>>>>> >>>>>>>>>> void main() { >>>>>>>>>> uint32_t head[4] = {0, 1, 2, 3}; >>>>>>>>>> uint32_t tail[4] = {0, 0xffffffff, 0xfffffffe, 0xfffffffd}; >>>>>>>>>> uint32_t mask = 4095; >>>>>>>>>> uint32_t result; >>>>>>>>>> int i; >>>>>>>>>> >>>>>>>>>> for (i = 0; i < 4; i++) { >>>>>>>>>> printf("head = %" PRIu32 " tail = %" PRIu32 ":\n", >>>>>>>>>> head[i], tail[i]); >>>>>>>>>> >>>>>>>>>> result = tail[i] - head[i]; >>>>>>>>>> printf("tail - head = %" PRIu32 "\n", result); >>>>>>>>>> >>>>>>>>>> result = (tail[i] - head[i]) & mask; >>>>>>>>>> printf("(tail - head) & mask = %" PRIu32 "\n", result); >>>>>>>>>> >>>>>>>>>> result = abs(tail[i] - head[i]); >>>>>>>>>> printf("abs(tail - head) = %" PRIu32 "\n\n", result); >>>>>>>>>> } >>>>>>>>>> } >>>>>>>>>> ``` >>>>>>>>>> in theory `tail - head` should be the number of elements in the >>>>>>>>>> ring, in this case 0, 2, 4, and 6. But running this test program >>>>>>>>>> gives the following output: >>>>>>>>>> ``` >>>>>>>>>> head = 0 tail = 0: >>>>>>>>>> tail - head = 0 >>>>>>>>>> (tail - head) & mask = 0 >>>>>>>>>> abs(tail - head) = 0 >>>>>>>>>> >>>>>>>>>> head = 1 tail = 4294967295: >>>>>>>>>> tail - head = 4294967294 >>>>>>>>>> (tail - head) & mask = 4094 >>>>>>>>>> abs(tail - head) = 2 >>>>>>>>>> >>>>>>>>>> head = 2 tail = 4294967294: >>>>>>>>>> tail - head = 4294967292 >>>>>>>>>> (tail - head) & mask = 4092 >>>>>>>>>> abs(tail - head) = 4 >>>>>>>>>> >>>>>>>>>> head = 3 tail = 4294967293: >>>>>>>>>> tail - head = 4294967290 >>>>>>>>>> (tail - head) & mask = 4090 >>>>>>>>>> abs(tail - head) = 6 >>>>>>>>>> ``` >>>>>>>>>> Since you're allowing head to run free over the 32-bit range of the >>>>>>>>>> variable, when the 32-bits rolls over you'll get a large positive >>>>>>>>>> number, not the small one you need to stay within the ring bounds. >>>>>>>>>> The alternative is to mask `head` and `tail` as you increment them, >>>>>>>>>> but then you run into the effective range issue. >>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>> OK >>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>> OK, the `ODP_ASSERT()` would still be useful for debugging. >>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>> That functionality is not obvious from the name. It either >>>>>>>>>>>>> implies that one of the input arguments is written (not true >>>>>>>>>>>>> here) or the reader might assume that it is an expression without >>>>>>>>>>>>> side-effect and should be deleted (what I originally thought when >>>>>>>>>>>>> reading it). You should pick a routine name that makes it clear >>>>>>>>>>>>> it's actually doing something real, in this case performing >>>>>>>>>>>>> prefetch processing. >>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>>> Elsewhere you write `if (ring_st_is_empty(...))`, not `if >>>>>>>>>>>>>> (ring_st_is_empty(...) == 1)` so this is inconsistent. >>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote: >>>>>>>>>>>>>>> Didn't try larger than 32. 32 is already quite large from QoS >>>>>>>>>>>>>>> point of view. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I'm planning to use config file for run time tunning, so this >>>>>>>>>>>>>>> hard coding may change in that phase. >>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote: >>>>>>>>>>>>>>>> One entry is not lost. >>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote: >>>>>>>>>>>>>>>>> One entry is not lost. User provided size if not (currently) >>>>>>>>>>>>>>>>> used. Queue size is always 4k. >>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote: >>>>>>>>>>>>>>>>>> One entry is not lost. >>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote: >>>>>>>>>>>>>>>>>>> One entry is not lost. >>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote: >>>>>>>>>>>>>>>>>>>> OK, added checks in v2. >>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote: >>>>>>>>>>>>>>>>>>>>> OK. Compiler probably did that already, but changed in v2. >>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote: >>>>>>>>>>>>>>>>>>>>>> Tail and head indexes are (masked from) uint32_t and do >>>>>>>>>>>>>>>>>>>>>> not wrap around when the ring is full. I think you >>>>>>>>>>>>>>>>>>>>>> assume that the store index is 0...size-1, while it's >>>>>>>>>>>>>>>>>>>>>> full uint32_t which is then masked to get the actual >>>>>>>>>>>>>>>>>>>>>> index. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> For example: >>>>>>>>>>>>>>>>>>>>>> size = 100; >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Empty: >>>>>>>>>>>>>>>>>>>>>> head = 100 >>>>>>>>>>>>>>>>>>>>>> tail = 100 >>>>>>>>>>>>>>>>>>>>>> num = 100 - 100 = 0 >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Full: >>>>>>>>>>>>>>>>>>>>>> head = 100 >>>>>>>>>>>>>>>>>>>>>> tail = 200 >>>>>>>>>>>>>>>>>>>>>> num = 200 - 100 = 100 >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Wrap uint32_t + full: >>>>>>>>>>>>>>>>>>>>>> head = 0xFFFFFF9C >>>>>>>>>>>>>>>>>>>>>> tail = 0 >>>>>>>>>>>>>>>>>>>>>> num = 0 - 0xFFFFFF9C = 0x64 = 100 >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> So, no abs() needed. Ring size can be 4096, instead of >>>>>>>>>>>>>>>>>>>>>> 4095. >>>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote: >>>>>>>>>>>>>>>>>>>>>>> It's already documented 5 lines above: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> /* Initialize ring. Ring size must be a power of two. */ >>>>>>>>>>>>>>>>>>>>>>> static inline void ring_st_init(ring_st_t *ring, >>>>>>>>>>>>>>>>>>>>>>> uint32_t *data, uint32_t size) >>>>>>>>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote: >>>>>>>>>>>>>>>>>>>>>>>> This function converts 32 bit buffer indexes to buffer >>>>>>>>>>>>>>>>>>>>>>>> header pointers. The counter operation is >>>>>>>>>>>>>>>>>>>>>>>> buffer_index_from_buf(). The prefetch is a side effect >>>>>>>>>>>>>>>>>>>>>>>> of the function, which may be changed/moved any time >>>>>>>>>>>>>>>>>>>>>>>> if it's found out that there's a place for >>>>>>>>>>>>>>>>>>>>>>>> prefetching. I actually plan to test if number of >>>>>>>>>>>>>>>>>>>>>>>> prefetches should be limited as e.g. 32 consecutive >>>>>>>>>>>>>>>>>>>>>>>> prefetches may be too much for some CPU architectures. >>>>>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote: >>>>>>>>>>>>>>>>>>>>>>>>> I prefer style where '== 0' is used instead of '!'. >>>>>>>>>>>>>>>>>>>>>>>>> Especially, when the if clause is as complex as this >>>>>>>>>>>>>>>>>>>>>>>>> and there's danger for reader to miss the '!' sign. >>>>>>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote: >>>>>>>>>>>>>>>>>>>>>>>>>> It's there to ensure that all bits are zero also >>>>>>>>>>>>>>>>>>>>>>>>>> when someone would modify the bitfield from two to >>>>>>>>>>>>>>>>>>>>>>>>>> three fields later on. Similarly to memset() zero is >>>>>>>>>>>>>>>>>>>>>>>>>> used for struct inits. >>>>>>>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>> There's no need for abs(). Since it's all uint32_t >>>>>>>>>>>>>>>>>>>>>>>>>>> variables, wrap a round is handled already. >>>>>>>>>>>>>>>>>>>>>>>>>>> An example in 8bits: >>>>>>>>>>>>>>>>>>>>>>>>>>> 0xff - 0xfd = 0x02 >>>>>>>>>>>>>>>>>>>>>>>>>>> 0x00 - 0xfe = 0x02 >>>>>>>>>>>>>>>>>>>>>>>>>>> 0x01 - 0xff = 0x02 >>>>>>>>>>>>>>>>>>>>>>>>>>> 0x02 - 0x00 = 0x02 >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> This passes both gcc and clang, and is used already >>>>>>>>>>>>>>>>>>>>>>>>>>> in the other ring implementation see >>>>>>>>>>>>>>>>>>>>>>>>>>> ring_deq_multi(). >>>>>>>>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>> I prefer style with blank line in the end of a >>>>>>>>>>>>>>>>>>>>>>>>>>>> typedef, since it's easier to spot the type name >>>>>>>>>>>>>>>>>>>>>>>>>>>> (as it's not mixed into struct field names). >>>>>>>>>>>>>>>>>>>>>>>>>>>> Checkpatch passes so this style should be OK. >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Does this mean that sizes larger than 32 have no >>>>>>>>>>>>>>>>>>>>>>>>>>>>> added performance benefit? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Must use `CONFIG_QUEUE_SIZE - 1` here, as noted >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> earlier, if we're not going to use the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> user-supplied queue size. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Given its name, this looks like an extraneous >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> statement that should be deleted. Renaming this >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to something like `prefetch_dequeued_bufs()` >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> would make the intent clearer here. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `if (!ring_st_is_empty(&queue->s.ring_st))` >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> seems more natural here. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Change to `if (param->size >= >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> CONFIG_QUEUE_SIZE)` to handle the effective >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> queue capacity. The user-supplied `size` >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should then be set to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `ROUNDUP_POWER2_U32(size) - 1` for the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> masking to work properly. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Same comment here as for plain queues. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> As noted earlier, due to "losing" one entry >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to distinguish queue empty/full, this >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should be returned as `CONFIG_QUEUE_SIZE - >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1`, and we also need to ensure that >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `CONFIG_QUEUE_SIZE` is itself a power of 2. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Since you're initializing `index.pool` and >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `index.buffer` there's no need to set >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `index.u32` here. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We originally had this index partitioning >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> based on `ODP_CONFIG_POOLS`. Do we want >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to return to that here? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If not, we at least need an >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `ODP_STATIC_ASSERT()` to ensure that >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `ODP_CONFIG_POOLS < 256` or else bad >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> things will happen here. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This routine can be optimized to: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ``` >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> return ring->head == ring->tail; >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ``` >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Your invariant is the queue is empty >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> when `head == tail` therefore the queue >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is full when `abs(tail - head) == >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mask`, so the correct calculation here >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `num = mask - abs(tail - head);` >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The effect is that a queue can only >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> hold `size - 1` elements, otherwise you >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cannot distinguish between a full and >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> an empty queue without another bit of >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> metadata, which is a cost you're trying >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to avoid. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This is somewhat problematic if the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> caller is trying to be "optimal" by >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> specifying a power of two in the `size` >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> parameter of the `odp_queue_param_t` >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> passed to `odp_queue_create()`. For >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this reason we may wish to return a >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `max_size` of a power of 2 - 1 in >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `odp_queue_capability()` as part of >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this patch series. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This only works if `size` is a power >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of 2. Should be documented as such, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> since this is an internal routine. In >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this case an `ODP_ASSERT(size == >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ROUNDUP_POWER2_U32(size))` for this >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> requirement would be a useful >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> debugging aid. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should be `num = abs(tail - head);` >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to deal with wrap arounds, otherwise >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> may be misinterpreted as overly large >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> since it's `uint32_t`. Note that GCC >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and clang recognize `abs()` and treat >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it as a builtin, so there's no actual >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `stdlib.h` call here. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Fischofer(Bill-Fischofer-Linaro) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Extra blank line should be removed >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (nit). https://github.com/Linaro/odp/pull/492#discussion_r170276901 updated_at 2018-02-23 15:19:39