Bill Fischofer(Bill-Fischofer-Linaro) replied on github web page:

platform/linux-generic/odp_queue.c
line 420
@@ -584,8 +556,9 @@ static int queue_init(queue_entry_t *queue, const char 
*name,
        queue->s.pktin = PKTIN_INVALID;
        queue->s.pktout = PKTOUT_INVALID;
 
-       queue->s.head = NULL;
-       queue->s.tail = NULL;
+       ring_st_init(&queue->s.ring_st,
+                    queue_tbl->ring_data[queue->s.index].data,
+                    CONFIG_QUEUE_SIZE);


Comment:
Agreed.

> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> Agreed, this is good for now. Later we may wish to honor the user-requested 
> queue `size` parameter.


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> Agreed.


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> Correct, as noted earlier. I withdraw that comment.


>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>> Presumably the compiler will see the overlap and optimize away the 
>>>> redundancy, so I assume the performance impact will be nil here.


>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>> Since you're allowing `head` and `tail` to run over the entire 32-bit 
>>>>> range, you're correct that you can completely fill the ring. I did miss 
>>>>> that point. However, as shown above you still need this to be:
>>>>> 
>>>>> ```
>>>>> num = size - abs(tail - head);
>>>>> ```
>>>>> To avoid problems at the 32-bit wrap boundary.


>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>> You're computing in 32 bits not 8 bits, and your ring size is less than 
>>>>>> 2^32 elements. Consider the following test program:
>>>>>> ```
>>>>>> #include <stdlib.h>
>>>>>> #include <stdio.h>
>>>>>> #include <stdint.h>
>>>>>> #include <inttypes.h>
>>>>>> 
>>>>>> void main() {
>>>>>>  uint32_t head[4] = {0, 1, 2, 3};
>>>>>>  uint32_t tail[4] = {0, 0xffffffff, 0xfffffffe, 0xfffffffd};
>>>>>>  uint32_t mask = 4095;
>>>>>>  uint32_t result;
>>>>>>  int i;
>>>>>> 
>>>>>>  for (i = 0; i < 4; i++) {
>>>>>>          printf("head = %" PRIu32 " tail = %" PRIu32 ":\n",
>>>>>>                 head[i], tail[i]);
>>>>>> 
>>>>>>          result = tail[i] - head[i];
>>>>>>          printf("tail - head = %" PRIu32 "\n", result);
>>>>>> 
>>>>>>          result = (tail[i] - head[i]) & mask;
>>>>>>          printf("(tail - head) & mask = %" PRIu32 "\n", result);
>>>>>> 
>>>>>>          result = abs(tail[i] - head[i]);
>>>>>>          printf("abs(tail - head) = %" PRIu32 "\n\n", result);
>>>>>>  }
>>>>>> }
>>>>>> ```
>>>>>> in theory `tail - head` should be the number of elements in the ring, in 
>>>>>> this case 0, 2, 4, and 6. But running this test program gives the 
>>>>>> following output:
>>>>>> ```
>>>>>> head = 0 tail = 0:
>>>>>> tail - head = 0
>>>>>> (tail - head) & mask = 0
>>>>>> abs(tail - head) = 0
>>>>>> 
>>>>>> head = 1 tail = 4294967295:
>>>>>> tail - head = 4294967294
>>>>>> (tail - head) & mask = 4094
>>>>>> abs(tail - head) = 2
>>>>>> 
>>>>>> head = 2 tail = 4294967294:
>>>>>> tail - head = 4294967292
>>>>>> (tail - head) & mask = 4092
>>>>>> abs(tail - head) = 4
>>>>>> 
>>>>>> head = 3 tail = 4294967293:
>>>>>> tail - head = 4294967290
>>>>>> (tail - head) & mask = 4090
>>>>>> abs(tail - head) = 6
>>>>>> ```
>>>>>> Since you're allowing head to run free over the 32-bit range of the 
>>>>>> variable, when the 32-bits rolls over you'll get a large positive 
>>>>>> number, not the small one you need to stay within the ring bounds. The 
>>>>>> alternative is to mask `head` and `tail` as you increment them, but then 
>>>>>> you run into the effective range issue.


>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>> OK


>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>> OK, the `ODP_ASSERT()` would still be useful for debugging.


>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>> That functionality is not obvious from the name. It either implies 
>>>>>>>>> that one of the input arguments is written (not true here) or the 
>>>>>>>>> reader might assume that it is an expression without side-effect and 
>>>>>>>>> should be deleted (what I originally thought when reading it). You 
>>>>>>>>> should pick a routine name that makes it clear it's actually doing 
>>>>>>>>> something real, in this case performing prefetch processing.


>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>> Elsewhere you write `if (ring_st_is_empty(...))`, not `if 
>>>>>>>>>> (ring_st_is_empty(...) == 1)` so this is inconsistent. 


>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>> Didn't try larger than 32. 32 is already quite large from QoS point 
>>>>>>>>>>> of view. 
>>>>>>>>>>> 
>>>>>>>>>>> I'm planning to use config file for run time tunning, so this hard 
>>>>>>>>>>> coding may change in that phase.


>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>> One entry is not lost.


>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>> One entry is not lost. User provided size if not (currently) 
>>>>>>>>>>>>> used. Queue size is always 4k.


>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>>> One entry is not lost.


>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>>>> One entry is not lost.


>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>>>>> OK, added checks in v2.


>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>>>>>> OK. Compiler probably did that already, but changed in v2.


>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>>>>>>> Tail and head indexes are (masked from) uint32_t and do not 
>>>>>>>>>>>>>>>>>> wrap around when the ring is full. I think you assume that 
>>>>>>>>>>>>>>>>>> the store index is 0...size-1, while it's full uint32_t 
>>>>>>>>>>>>>>>>>> which is then masked to get the actual index.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> For example:
>>>>>>>>>>>>>>>>>> size = 100;
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Empty:
>>>>>>>>>>>>>>>>>> head = 100
>>>>>>>>>>>>>>>>>> tail = 100
>>>>>>>>>>>>>>>>>> num = 100 - 100 = 0
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Full:
>>>>>>>>>>>>>>>>>> head = 100
>>>>>>>>>>>>>>>>>> tail = 200
>>>>>>>>>>>>>>>>>> num = 200 - 100 = 100
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Wrap uint32_t + full:
>>>>>>>>>>>>>>>>>> head = 0xFFFFFF9C
>>>>>>>>>>>>>>>>>> tail = 0
>>>>>>>>>>>>>>>>>> num = 0 - 0xFFFFFF9C = 0x64 = 100
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> So, no abs() needed. Ring size can be 4096, instead of 4095.


>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>>>>>>>> It's already documented 5 lines above:
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> /* Initialize ring. Ring size must be a power of two. */
>>>>>>>>>>>>>>>>>>> static inline void ring_st_init(ring_st_t *ring, uint32_t 
>>>>>>>>>>>>>>>>>>> *data, uint32_t size)
>>>>>>>>>>>>>>>>>>> {


>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>>>>>>>>> This function converts 32 bit buffer indexes to buffer 
>>>>>>>>>>>>>>>>>>>> header pointers. The counter operation is 
>>>>>>>>>>>>>>>>>>>> buffer_index_from_buf(). The prefetch is a side effect of 
>>>>>>>>>>>>>>>>>>>> the function, which may be changed/moved any time if it's 
>>>>>>>>>>>>>>>>>>>> found out that there's a place for prefetching. I actually 
>>>>>>>>>>>>>>>>>>>> plan to test if number of prefetches should be limited as 
>>>>>>>>>>>>>>>>>>>> e.g. 32 consecutive prefetches may be too much for some 
>>>>>>>>>>>>>>>>>>>> CPU architectures.


>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>>>>>>>>>> I prefer style where '== 0' is used instead of '!'. 
>>>>>>>>>>>>>>>>>>>>> Especially, when the if clause is as complex as this and 
>>>>>>>>>>>>>>>>>>>>> there's danger for reader to miss the '!' sign. 


>>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>>>>>>>>>>> It's there to ensure that all bits are zero also when 
>>>>>>>>>>>>>>>>>>>>>> someone would modify the bitfield from two to three 
>>>>>>>>>>>>>>>>>>>>>> fields later on. Similarly to memset() zero is used for 
>>>>>>>>>>>>>>>>>>>>>> struct inits.


>>>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>>>>>>>>>>>> There's no need for abs(). Since it's all uint32_t 
>>>>>>>>>>>>>>>>>>>>>>> variables, wrap a round is handled already.
>>>>>>>>>>>>>>>>>>>>>>> An example in 8bits:
>>>>>>>>>>>>>>>>>>>>>>> 0xff - 0xfd = 0x02
>>>>>>>>>>>>>>>>>>>>>>> 0x00 - 0xfe = 0x02
>>>>>>>>>>>>>>>>>>>>>>> 0x01 - 0xff = 0x02
>>>>>>>>>>>>>>>>>>>>>>> 0x02 - 0x00 = 0x02
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> This passes both gcc and clang, and is used already in 
>>>>>>>>>>>>>>>>>>>>>>> the other ring implementation see ring_deq_multi().


>>>>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>>>>>>>>>>>>> I prefer style with blank line in the end of a 
>>>>>>>>>>>>>>>>>>>>>>>> typedef, since it's easier to spot the type name (as 
>>>>>>>>>>>>>>>>>>>>>>>> it's not mixed into struct field names). Checkpatch 
>>>>>>>>>>>>>>>>>>>>>>>> passes so this style should be OK.


>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>> Does this mean that sizes larger than 32 have no 
>>>>>>>>>>>>>>>>>>>>>>>>> added performance benefit?


>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>> Must use `CONFIG_QUEUE_SIZE - 1` here, as noted 
>>>>>>>>>>>>>>>>>>>>>>>>>> earlier, if we're not going to use the user-supplied 
>>>>>>>>>>>>>>>>>>>>>>>>>> queue size.


>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>> Given its name, this looks like an extraneous 
>>>>>>>>>>>>>>>>>>>>>>>>>>> statement that should be deleted. Renaming this to 
>>>>>>>>>>>>>>>>>>>>>>>>>>> something like `prefetch_dequeued_bufs()` would 
>>>>>>>>>>>>>>>>>>>>>>>>>>> make the intent clearer here.


>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>> `if (!ring_st_is_empty(&queue->s.ring_st))` seems 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> more natural here.


>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Change to `if (param->size >= CONFIG_QUEUE_SIZE)` 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to handle the effective queue capacity. The 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> user-supplied `size` should then be set to 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `ROUNDUP_POWER2_U32(size) - 1` for the masking to 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> work properly.


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Same comment here as for plain queues.


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> As noted earlier, due to "losing" one entry to 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> distinguish queue empty/full, this should be 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> returned as `CONFIG_QUEUE_SIZE - 1`, and we 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> also need to ensure that `CONFIG_QUEUE_SIZE` is 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> itself a power of 2.


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Since you're initializing `index.pool` and 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `index.buffer` there's no need to set 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `index.u32` here.


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We originally had this index partitioning 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> based on `ODP_CONFIG_POOLS`. Do we want to 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> return to that here?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If not, we at least need an 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `ODP_STATIC_ASSERT()` to ensure that 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `ODP_CONFIG_POOLS < 256` or else bad things 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> will happen here.


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This routine can be optimized to:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> return ring->head == ring->tail;
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ```


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Your invariant is the queue is empty when 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `head == tail` therefore the queue is full 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> when `abs(tail - head) == mask`, so the 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> correct calculation here is:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `num = mask - abs(tail - head);`
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The effect is that a queue can only hold 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `size - 1` elements, otherwise you cannot 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> distinguish between a full and an empty 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> queue without another bit of metadata, 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> which is a cost you're trying to avoid.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This is somewhat problematic if the caller 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is trying to be "optimal" by specifying a 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> power of two in the `size` parameter of the 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `odp_queue_param_t` passed to 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `odp_queue_create()`. For this reason we 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> may wish to return a `max_size` of a power 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of 2 - 1 in `odp_queue_capability()` as 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> part of this patch series.


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This only works if `size` is a power of 2. 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Should be documented as such, since this 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is an internal routine. In this case an 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `ODP_ASSERT(size == 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ROUNDUP_POWER2_U32(size))` for this 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> requirement would be a useful debugging 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> aid.


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should be `num = abs(tail - head);` to 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> deal with wrap arounds, otherwise may be 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> misinterpreted as overly large since it's 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `uint32_t`. Note that GCC and clang 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> recognize `abs()` and treat it as a 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> builtin, so there's no actual `stdlib.h` 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> call here.


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Extra blank line should be removed (nit).


https://github.com/Linaro/odp/pull/492#discussion_r170151159
updated_at 2018-02-23 02:21:54

Reply via email to