Bill Fischofer(Bill-Fischofer-Linaro) replied on github web page:

platform/linux-generic/odp_queue.c
line 103
@@ -192,6 +201,9 @@ static odp_queue_t queue_create(const char *name,
                param = &default_param;
        }
 
+       if (param->size > CONFIG_QUEUE_SIZE)


Comment:
Agreed, this is good for now. Later we may wish to honor the user-requested 
queue `size` parameter.

> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> Agreed.


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> Correct, as noted earlier. I withdraw that comment.


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> Presumably the compiler will see the overlap and optimize away the 
>>> redundancy, so I assume the performance impact will be nil here.


>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>> Since you're allowing `head` and `tail` to run over the entire 32-bit 
>>>> range, you're correct that you can completely fill the ring. I did miss 
>>>> that point. However, as shown above you still need this to be:
>>>> 
>>>> ```
>>>> num = size - abs(tail - head);
>>>> ```
>>>> To avoid problems at the 32-bit wrap boundary.


>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>> You're computing in 32 bits not 8 bits, and your ring size is less than 
>>>>> 2^32 elements. Consider the following test program:
>>>>> ```
>>>>> #include <stdlib.h>
>>>>> #include <stdio.h>
>>>>> #include <stdint.h>
>>>>> #include <inttypes.h>
>>>>> 
>>>>> void main() {
>>>>>   uint32_t head[4] = {0, 1, 2, 3};
>>>>>   uint32_t tail[4] = {0, 0xffffffff, 0xfffffffe, 0xfffffffd};
>>>>>   uint32_t mask = 4095;
>>>>>   uint32_t result;
>>>>>   int i;
>>>>> 
>>>>>   for (i = 0; i < 4; i++) {
>>>>>           printf("head = %" PRIu32 " tail = %" PRIu32 ":\n",
>>>>>                  head[i], tail[i]);
>>>>> 
>>>>>           result = tail[i] - head[i];
>>>>>           printf("tail - head = %" PRIu32 "\n", result);
>>>>> 
>>>>>           result = (tail[i] - head[i]) & mask;
>>>>>           printf("(tail - head) & mask = %" PRIu32 "\n", result);
>>>>> 
>>>>>           result = abs(tail[i] - head[i]);
>>>>>           printf("abs(tail - head) = %" PRIu32 "\n\n", result);
>>>>>   }
>>>>> }
>>>>> ```
>>>>> in theory `tail - head` should be the number of elements in the ring, in 
>>>>> this case 0, 2, 4, and 6. But running this test program gives the 
>>>>> following output:
>>>>> ```
>>>>> head = 0 tail = 0:
>>>>> tail - head = 0
>>>>> (tail - head) & mask = 0
>>>>> abs(tail - head) = 0
>>>>> 
>>>>> head = 1 tail = 4294967295:
>>>>> tail - head = 4294967294
>>>>> (tail - head) & mask = 4094
>>>>> abs(tail - head) = 2
>>>>> 
>>>>> head = 2 tail = 4294967294:
>>>>> tail - head = 4294967292
>>>>> (tail - head) & mask = 4092
>>>>> abs(tail - head) = 4
>>>>> 
>>>>> head = 3 tail = 4294967293:
>>>>> tail - head = 4294967290
>>>>> (tail - head) & mask = 4090
>>>>> abs(tail - head) = 6
>>>>> ```
>>>>> Since you're allowing head to run free over the 32-bit range of the 
>>>>> variable, when the 32-bits rolls over you'll get a large positive number, 
>>>>> not the small one you need to stay within the ring bounds. The 
>>>>> alternative is to mask `head` and `tail` as you increment them, but then 
>>>>> you run into the effective range issue.


>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>> OK


>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>> OK, the `ODP_ASSERT()` would still be useful for debugging.


>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>> That functionality is not obvious from the name. It either implies 
>>>>>>>> that one of the input arguments is written (not true here) or the 
>>>>>>>> reader might assume that it is an expression without side-effect and 
>>>>>>>> should be deleted (what I originally thought when reading it). You 
>>>>>>>> should pick a routine name that makes it clear it's actually doing 
>>>>>>>> something real, in this case performing prefetch processing.


>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>> Elsewhere you write `if (ring_st_is_empty(...))`, not `if 
>>>>>>>>> (ring_st_is_empty(...) == 1)` so this is inconsistent. 


>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>> Didn't try larger than 32. 32 is already quite large from QoS point 
>>>>>>>>>> of view. 
>>>>>>>>>> 
>>>>>>>>>> I'm planning to use config file for run time tunning, so this hard 
>>>>>>>>>> coding may change in that phase.


>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>> One entry is not lost.


>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>> One entry is not lost. User provided size if not (currently) used. 
>>>>>>>>>>>> Queue size is always 4k.


>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>> One entry is not lost.


>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>>> One entry is not lost.


>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>>>> OK, added checks in v2.


>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>>>>> OK. Compiler probably did that already, but changed in v2.


>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>>>>>> Tail and head indexes are (masked from) uint32_t and do not 
>>>>>>>>>>>>>>>>> wrap around when the ring is full. I think you assume that 
>>>>>>>>>>>>>>>>> the store index is 0...size-1, while it's full uint32_t which 
>>>>>>>>>>>>>>>>> is then masked to get the actual index.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> For example:
>>>>>>>>>>>>>>>>> size = 100;
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Empty:
>>>>>>>>>>>>>>>>> head = 100
>>>>>>>>>>>>>>>>> tail = 100
>>>>>>>>>>>>>>>>> num = 100 - 100 = 0
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Full:
>>>>>>>>>>>>>>>>> head = 100
>>>>>>>>>>>>>>>>> tail = 200
>>>>>>>>>>>>>>>>> num = 200 - 100 = 100
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Wrap uint32_t + full:
>>>>>>>>>>>>>>>>> head = 0xFFFFFF9C
>>>>>>>>>>>>>>>>> tail = 0
>>>>>>>>>>>>>>>>> num = 0 - 0xFFFFFF9C = 0x64 = 100
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> So, no abs() needed. Ring size can be 4096, instead of 4095.


>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>>>>>>> It's already documented 5 lines above:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> /* Initialize ring. Ring size must be a power of two. */
>>>>>>>>>>>>>>>>>> static inline void ring_st_init(ring_st_t *ring, uint32_t 
>>>>>>>>>>>>>>>>>> *data, uint32_t size)
>>>>>>>>>>>>>>>>>> {


>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>>>>>>>> This function converts 32 bit buffer indexes to buffer 
>>>>>>>>>>>>>>>>>>> header pointers. The counter operation is 
>>>>>>>>>>>>>>>>>>> buffer_index_from_buf(). The prefetch is a side effect of 
>>>>>>>>>>>>>>>>>>> the function, which may be changed/moved any time if it's 
>>>>>>>>>>>>>>>>>>> found out that there's a place for prefetching. I actually 
>>>>>>>>>>>>>>>>>>> plan to test if number of prefetches should be limited as 
>>>>>>>>>>>>>>>>>>> e.g. 32 consecutive prefetches may be too much for some CPU 
>>>>>>>>>>>>>>>>>>> architectures.


>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>>>>>>>>> I prefer style where '== 0' is used instead of '!'. 
>>>>>>>>>>>>>>>>>>>> Especially, when the if clause is as complex as this and 
>>>>>>>>>>>>>>>>>>>> there's danger for reader to miss the '!' sign. 


>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>>>>>>>>>> It's there to ensure that all bits are zero also when 
>>>>>>>>>>>>>>>>>>>>> someone would modify the bitfield from two to three 
>>>>>>>>>>>>>>>>>>>>> fields later on. Similarly to memset() zero is used for 
>>>>>>>>>>>>>>>>>>>>> struct inits.


>>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>>>>>>>>>>> There's no need for abs(). Since it's all uint32_t 
>>>>>>>>>>>>>>>>>>>>>> variables, wrap a round is handled already.
>>>>>>>>>>>>>>>>>>>>>> An example in 8bits:
>>>>>>>>>>>>>>>>>>>>>> 0xff - 0xfd = 0x02
>>>>>>>>>>>>>>>>>>>>>> 0x00 - 0xfe = 0x02
>>>>>>>>>>>>>>>>>>>>>> 0x01 - 0xff = 0x02
>>>>>>>>>>>>>>>>>>>>>> 0x02 - 0x00 = 0x02
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> This passes both gcc and clang, and is used already in 
>>>>>>>>>>>>>>>>>>>>>> the other ring implementation see ring_deq_multi().


>>>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>>>>>>>>>>>> I prefer style with blank line in the end of a typedef, 
>>>>>>>>>>>>>>>>>>>>>>> since it's easier to spot the type name (as it's not 
>>>>>>>>>>>>>>>>>>>>>>> mixed into struct field names). Checkpatch passes so 
>>>>>>>>>>>>>>>>>>>>>>> this style should be OK.


>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>>>>>> Does this mean that sizes larger than 32 have no added 
>>>>>>>>>>>>>>>>>>>>>>>> performance benefit?


>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>> Must use `CONFIG_QUEUE_SIZE - 1` here, as noted 
>>>>>>>>>>>>>>>>>>>>>>>>> earlier, if we're not going to use the user-supplied 
>>>>>>>>>>>>>>>>>>>>>>>>> queue size.


>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>> Given its name, this looks like an extraneous 
>>>>>>>>>>>>>>>>>>>>>>>>>> statement that should be deleted. Renaming this to 
>>>>>>>>>>>>>>>>>>>>>>>>>> something like `prefetch_dequeued_bufs()` would make 
>>>>>>>>>>>>>>>>>>>>>>>>>> the intent clearer here.


>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>> `if (!ring_st_is_empty(&queue->s.ring_st))` seems 
>>>>>>>>>>>>>>>>>>>>>>>>>>> more natural here.


>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Change to `if (param->size >= CONFIG_QUEUE_SIZE)` 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> to handle the effective queue capacity. The 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> user-supplied `size` should then be set to 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> `ROUNDUP_POWER2_U32(size) - 1` for the masking to 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> work properly.


>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Same comment here as for plain queues.


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> As noted earlier, due to "losing" one entry to 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> distinguish queue empty/full, this should be 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> returned as `CONFIG_QUEUE_SIZE - 1`, and we also 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> need to ensure that `CONFIG_QUEUE_SIZE` is 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> itself a power of 2.


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Since you're initializing `index.pool` and 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `index.buffer` there's no need to set 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `index.u32` here.


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We originally had this index partitioning 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> based on `ODP_CONFIG_POOLS`. Do we want to 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> return to that here?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If not, we at least need an 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `ODP_STATIC_ASSERT()` to ensure that 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `ODP_CONFIG_POOLS < 256` or else bad things 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> will happen here.


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This routine can be optimized to:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> return ring->head == ring->tail;
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ```


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Your invariant is the queue is empty when 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `head == tail` therefore the queue is full 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> when `abs(tail - head) == mask`, so the 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> correct calculation here is:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `num = mask - abs(tail - head);`
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The effect is that a queue can only hold 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `size - 1` elements, otherwise you cannot 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> distinguish between a full and an empty 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> queue without another bit of metadata, which 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is a cost you're trying to avoid.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This is somewhat problematic if the caller 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is trying to be "optimal" by specifying a 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> power of two in the `size` parameter of the 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `odp_queue_param_t` passed to 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `odp_queue_create()`. For this reason we may 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wish to return a `max_size` of a power of 2 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - 1 in `odp_queue_capability()` as part of 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this patch series.


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This only works if `size` is a power of 2. 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Should be documented as such, since this is 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> an internal routine. In this case an 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `ODP_ASSERT(size == 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ROUNDUP_POWER2_U32(size))` for this 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> requirement would be a useful debugging aid.


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should be `num = abs(tail - head);` to 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> deal with wrap arounds, otherwise may be 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> misinterpreted as overly large since it's 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `uint32_t`. Note that GCC and clang 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> recognize `abs()` and treat it as a 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> builtin, so there's no actual `stdlib.h` 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> call here.


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Extra blank line should be removed (nit).


https://github.com/Linaro/odp/pull/492#discussion_r170151030
updated_at 2018-02-23 02:21:54

Reply via email to