Bill Fischofer(Bill-Fischofer-Linaro) replied on github web page:

platform/linux-generic/include/odp_ring_st_internal.h
line 46
@@ -0,0 +1,118 @@
+/* Copyright (c) 2018, Linaro Limited
+ * All rights reserved.
+ *
+ * SPDX-License-Identifier:     BSD-3-Clause
+ */
+
+#ifndef ODP_RING_ST_INTERNAL_H_
+#define ODP_RING_ST_INTERNAL_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <odp/api/hints.h>
+#include <odp_align_internal.h>
+
+/* Basic ring for single thread usage. Operations must be synchronized by using
+ * locks (or other means), when multiple threads use the same ring. */
+typedef struct {
+       uint32_t head;
+       uint32_t tail;
+       uint32_t mask;
+       uint32_t *data;
+
+} ring_st_t;
+
+/* Initialize ring. Ring size must be a power of two. */
+static inline void ring_st_init(ring_st_t *ring, uint32_t *data, uint32_t size)
+{
+       ring->head = 0;
+       ring->tail = 0;
+       ring->mask = size - 1;
+       ring->data = data;
+}
+
+/* Dequeue data from the ring head. Max_num is smaller than ring size.*/
+static inline uint32_t ring_st_deq_multi(ring_st_t *ring, uint32_t data[],
+                                        uint32_t max_num)
+{
+       uint32_t head, tail, mask, idx;
+       uint32_t num, i;
+
+       head = ring->head;
+       tail = ring->tail;
+       mask = ring->mask;
+       num  = tail - head;


Comment:
OK

> Petri Savolainen(psavol) wrote:
> ```
> if (ring_st_is_empty(&queue->s.ring_st))
> 
> if (ring_st_is_empty(&queue->s.ring_st))
> 
> if (ring_st_is_empty(&queue->s.ring_st) == 0)
> 
> if (ring_st_is_empty(&queue->s.ring_st))
> 
> if (ring_st_is_empty(&queue->s.ring_st))
> 
> if (!ring_st_is_empty(&queue->s.ring_st))
> 
> if (ring_st_is_empty(&queue->s.ring_st))
> 
> if (ring_st_is_empty(&queue->s.ring_st))
> ```
> 
> Find if clause for "ring is not empty" from picture above.


>> Petri Savolainen(psavol) wrote:
>> It's unsigned 32 bit arithmetic, not signed integer arithmetic. Also, tail 
>> is always ahead of head, but no more than queue size (e.g. 4k), So, e.g. 
>> when tail is 0, and head is 0xffffffff the number of items in ring is 0 - 
>> 0xffffffff = 1
>> 
>> ```
>> head            tail
>> hex    [index], hex    [index], num, num_free
>> 0xfffffffe [6], 0xffffffff [7], 1,   7
>> 0xffffffff [7], 0x00000000 [0], 1,   7
>> 0x00000000 [0], 0x00000001 [1], 1,   7
>> ```


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> Please see above discussion. The comment stands.


>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>> Not true. See the comment below where it's clear that `head` and `tail` 
>>>> are free running and thus will eventually wrap as described above. If 
>>>> these were `uint64_t` variables you could argue that the wraps would take 
>>>> centuries and hence are nothing to worry about, but being `uint32_t` 
>>>> variables this should be expected to happen regularly on heavily used 
>>>> queues. 
>>>> 
>>>> `abs()` adds no overhead since compilers treat it as an intrinsic.


>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>> Note that neither `head` nor `tail` get masked when doing enqueues or 
>>>>> dequeues, so they will traverse the entire 32-bit range of the variable 
>>>>> over time. While the ring itself will never hold more than `size` 
>>>>> entries, `head` and `tail` will wrap around. When you index into the ring 
>>>>> you do so via a mask (`idx = head & mask;`), but calculating the number 
>>>>> of elements in the ring doesn't do this, which is why `abs()` is needed.


>>>>>> Petri Savolainen(psavol) wrote:
>>>>>> Plus this is done only once - in pool create phase.


>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>> No since ring size will be much smaller than 4 billion.


>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>> The point is that ring size will never be close to 4 billion entries. 
>>>>>>>> E.g. currently tail is always max 4096 larger than head. Your example 
>>>>>>>> above is based assumption of 4 billion entry ring. Overflow is avoided 
>>>>>>>> when ring size if <2 billion, as 32 bit indexes can be still used to 
>>>>>>>> calculate number of items correctly.


>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>> Agreed.


>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>> Agreed, this is good for now. Later we may wish to honor the 
>>>>>>>>>> user-requested queue `size` parameter.


>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>> Agreed.


>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>> Correct, as noted earlier. I withdraw that comment.


>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>> Presumably the compiler will see the overlap and optimize away 
>>>>>>>>>>>>> the redundancy, so I assume the performance impact will be nil 
>>>>>>>>>>>>> here.


>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>> Since you're allowing `head` and `tail` to run over the entire 
>>>>>>>>>>>>>> 32-bit range, you're correct that you can completely fill the 
>>>>>>>>>>>>>> ring. I did miss that point. However, as shown above you still 
>>>>>>>>>>>>>> need this to be:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>> num = size - abs(tail - head);
>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>> To avoid problems at the 32-bit wrap boundary.


>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>> You're computing in 32 bits not 8 bits, and your ring size is 
>>>>>>>>>>>>>>> less than 2^32 elements. Consider the following test program:
>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>> #include <stdlib.h>
>>>>>>>>>>>>>>> #include <stdio.h>
>>>>>>>>>>>>>>> #include <stdint.h>
>>>>>>>>>>>>>>> #include <inttypes.h>
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> void main() {
>>>>>>>>>>>>>>>         uint32_t head[4] = {0, 1, 2, 3};
>>>>>>>>>>>>>>>         uint32_t tail[4] = {0, 0xffffffff, 0xfffffffe, 
>>>>>>>>>>>>>>> 0xfffffffd};
>>>>>>>>>>>>>>>         uint32_t mask = 4095;
>>>>>>>>>>>>>>>         uint32_t result;
>>>>>>>>>>>>>>>         int i;
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>         for (i = 0; i < 4; i++) {
>>>>>>>>>>>>>>>                 printf("head = %" PRIu32 " tail = %" PRIu32 
>>>>>>>>>>>>>>> ":\n",
>>>>>>>>>>>>>>>                        head[i], tail[i]);
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>                 result = tail[i] - head[i];
>>>>>>>>>>>>>>>                 printf("tail - head = %" PRIu32 "\n", result);
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>                 result = (tail[i] - head[i]) & mask;
>>>>>>>>>>>>>>>                 printf("(tail - head) & mask = %" PRIu32 "\n", 
>>>>>>>>>>>>>>> result);
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>                 result = abs(tail[i] - head[i]);
>>>>>>>>>>>>>>>                 printf("abs(tail - head) = %" PRIu32 "\n\n", 
>>>>>>>>>>>>>>> result);
>>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>> in theory `tail - head` should be the number of elements in the 
>>>>>>>>>>>>>>> ring, in this case 0, 2, 4, and 6. But running this test 
>>>>>>>>>>>>>>> program gives the following output:
>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>> head = 0 tail = 0:
>>>>>>>>>>>>>>> tail - head = 0
>>>>>>>>>>>>>>> (tail - head) & mask = 0
>>>>>>>>>>>>>>> abs(tail - head) = 0
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> head = 1 tail = 4294967295:
>>>>>>>>>>>>>>> tail - head = 4294967294
>>>>>>>>>>>>>>> (tail - head) & mask = 4094
>>>>>>>>>>>>>>> abs(tail - head) = 2
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> head = 2 tail = 4294967294:
>>>>>>>>>>>>>>> tail - head = 4294967292
>>>>>>>>>>>>>>> (tail - head) & mask = 4092
>>>>>>>>>>>>>>> abs(tail - head) = 4
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> head = 3 tail = 4294967293:
>>>>>>>>>>>>>>> tail - head = 4294967290
>>>>>>>>>>>>>>> (tail - head) & mask = 4090
>>>>>>>>>>>>>>> abs(tail - head) = 6
>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>> Since you're allowing head to run free over the 32-bit range of 
>>>>>>>>>>>>>>> the variable, when the 32-bits rolls over you'll get a large 
>>>>>>>>>>>>>>> positive number, not the small one you need to stay within the 
>>>>>>>>>>>>>>> ring bounds. The alternative is to mask `head` and `tail` as 
>>>>>>>>>>>>>>> you increment them, but then you run into the effective range 
>>>>>>>>>>>>>>> issue.


>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>> OK


>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>> OK, the `ODP_ASSERT()` would still be useful for debugging.


>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>> That functionality is not obvious from the name. It either 
>>>>>>>>>>>>>>>>>> implies that one of the input arguments is written (not true 
>>>>>>>>>>>>>>>>>> here) or the reader might assume that it is an expression 
>>>>>>>>>>>>>>>>>> without side-effect and should be deleted (what I originally 
>>>>>>>>>>>>>>>>>> thought when reading it). You should pick a routine name 
>>>>>>>>>>>>>>>>>> that makes it clear it's actually doing something real, in 
>>>>>>>>>>>>>>>>>> this case performing prefetch processing.


>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>> Elsewhere you write `if (ring_st_is_empty(...))`, not `if 
>>>>>>>>>>>>>>>>>>> (ring_st_is_empty(...) == 1)` so this is inconsistent. 


>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>>>>>>>>> Didn't try larger than 32. 32 is already quite large from 
>>>>>>>>>>>>>>>>>>>> QoS point of view. 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> I'm planning to use config file for run time tunning, so 
>>>>>>>>>>>>>>>>>>>> this hard coding may change in that phase.


>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>>>>>>>>>> One entry is not lost.


>>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>>>>>>>>>>> One entry is not lost. User provided size if not 
>>>>>>>>>>>>>>>>>>>>>> (currently) used. Queue size is always 4k.


>>>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>>>>>>>>>>>> One entry is not lost.


>>>>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>>>>>>>>>>>>> One entry is not lost.


>>>>>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>> OK, added checks in v2.


>>>>>>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>> OK. Compiler probably did that already, but changed 
>>>>>>>>>>>>>>>>>>>>>>>>>> in v2.


>>>>>>>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>> Tail and head indexes are (masked from) uint32_t 
>>>>>>>>>>>>>>>>>>>>>>>>>>> and do not wrap around when the ring is full. I 
>>>>>>>>>>>>>>>>>>>>>>>>>>> think you assume that the store index is 
>>>>>>>>>>>>>>>>>>>>>>>>>>> 0...size-1, while it's full uint32_t which is then 
>>>>>>>>>>>>>>>>>>>>>>>>>>> masked to get the actual index.
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> For example:
>>>>>>>>>>>>>>>>>>>>>>>>>>> size = 100;
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> Empty:
>>>>>>>>>>>>>>>>>>>>>>>>>>> head = 100
>>>>>>>>>>>>>>>>>>>>>>>>>>> tail = 100
>>>>>>>>>>>>>>>>>>>>>>>>>>> num = 100 - 100 = 0
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> Full:
>>>>>>>>>>>>>>>>>>>>>>>>>>> head = 100
>>>>>>>>>>>>>>>>>>>>>>>>>>> tail = 200
>>>>>>>>>>>>>>>>>>>>>>>>>>> num = 200 - 100 = 100
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> Wrap uint32_t + full:
>>>>>>>>>>>>>>>>>>>>>>>>>>> head = 0xFFFFFF9C
>>>>>>>>>>>>>>>>>>>>>>>>>>> tail = 0
>>>>>>>>>>>>>>>>>>>>>>>>>>> num = 0 - 0xFFFFFF9C = 0x64 = 100
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> So, no abs() needed. Ring size can be 4096, instead 
>>>>>>>>>>>>>>>>>>>>>>>>>>> of 4095.


>>>>>>>>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>> It's already documented 5 lines above:
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> /* Initialize ring. Ring size must be a power of 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> two. */
>>>>>>>>>>>>>>>>>>>>>>>>>>>> static inline void ring_st_init(ring_st_t *ring, 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> uint32_t *data, uint32_t size)
>>>>>>>>>>>>>>>>>>>>>>>>>>>> {


>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This function converts 32 bit buffer indexes to 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> buffer header pointers. The counter operation is 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> buffer_index_from_buf(). The prefetch is a side 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> effect of the function, which may be 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> changed/moved any time if it's found out that 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> there's a place for prefetching. I actually plan 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to test if number of prefetches should be limited 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> as e.g. 32 consecutive prefetches may be too much 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for some CPU architectures.


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I prefer style where '== 0' is used instead of 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '!'. Especially, when the if clause is as 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> complex as this and there's danger for reader to 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> miss the '!' sign. 


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It's there to ensure that all bits are zero 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> also when someone would modify the bitfield 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> from two to three fields later on. Similarly to 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> memset() zero is used for struct inits.


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> There's no need for abs(). Since it's all 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> uint32_t variables, wrap a round is handled 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> already.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> An example in 8bits:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0xff - 0xfd = 0x02
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x00 - 0xfe = 0x02
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x01 - 0xff = 0x02
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x02 - 0x00 = 0x02
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This passes both gcc and clang, and is used 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> already in the other ring implementation see 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ring_deq_multi().


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I prefer style with blank line in the end of 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> a typedef, since it's easier to spot the type 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> name (as it's not mixed into struct field 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> names). Checkpatch passes so this style 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should be OK.


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Does this mean that sizes larger than 32 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> have no added performance benefit?


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Must use `CONFIG_QUEUE_SIZE - 1` here, as 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> noted earlier, if we're not going to use 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the user-supplied queue size.


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Given its name, this looks like an 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> extraneous statement that should be 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> deleted. Renaming this to something like 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `prefetch_dequeued_bufs()` would make the 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> intent clearer here.


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `if 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (!ring_st_is_empty(&queue->s.ring_st))` 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> seems more natural here.


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Change to `if (param->size >= 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> CONFIG_QUEUE_SIZE)` to handle the 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> effective queue capacity. The 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> user-supplied `size` should then be set 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to `ROUNDUP_POWER2_U32(size) - 1` for 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the masking to work properly.


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Same comment here as for plain queues.


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> As noted earlier, due to "losing" one 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> entry to distinguish queue empty/full, 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this should be returned as 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `CONFIG_QUEUE_SIZE - 1`, and we also 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> need to ensure that 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `CONFIG_QUEUE_SIZE` is itself a power 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of 2.


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Since you're initializing 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `index.pool` and `index.buffer` 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> there's no need to set `index.u32` 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> here.


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Fischofer(Bill-Fischofer-Linaro) 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We originally had this index 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> partitioning based on 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `ODP_CONFIG_POOLS`. Do we want to 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> return to that here?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If not, we at least need an 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `ODP_STATIC_ASSERT()` to ensure that 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `ODP_CONFIG_POOLS < 256` or else bad 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> things will happen here.


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Fischofer(Bill-Fischofer-Linaro) 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This routine can be optimized to:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> return ring->head == ring->tail;
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ```


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Fischofer(Bill-Fischofer-Linaro) 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Your invariant is the queue is 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> empty when `head == tail` 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> therefore the queue is full when 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `abs(tail - head) == mask`, so the 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> correct calculation here is:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `num = mask - abs(tail - head);`
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The effect is that a queue can 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> only hold `size - 1` elements, 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> otherwise you cannot distinguish 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> between a full and an empty queue 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> without another bit of metadata, 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> which is a cost you're trying to 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> avoid.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This is somewhat problematic if 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the caller is trying to be 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "optimal" by specifying a power of 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> two in the `size` parameter of the 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `odp_queue_param_t` passed to 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `odp_queue_create()`. For this 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> reason we may wish to return a 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `max_size` of a power of 2 - 1 in 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `odp_queue_capability()` as part 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of this patch series.


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Fischofer(Bill-Fischofer-Linaro) 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This only works if `size` is a 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> power of 2. Should be documented 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> as such, since this is an 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> internal routine. In this case an 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `ODP_ASSERT(size == 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ROUNDUP_POWER2_U32(size))` for 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this requirement would be a 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> useful debugging aid.


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Fischofer(Bill-Fischofer-Linaro) 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should be `num = abs(tail - 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> head);` to deal with wrap 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> arounds, otherwise may be 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> misinterpreted as overly large 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> since it's `uint32_t`. Note that 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> GCC and clang recognize `abs()` 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and treat it as a builtin, so 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> there's no actual `stdlib.h` 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> call here.


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Fischofer(Bill-Fischofer-Linaro)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>  wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Extra blank line should be 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> removed (nit).


https://github.com/Linaro/odp/pull/492#discussion_r170581894
updated_at 2018-02-26 12:56:13

Reply via email to