Re: [lng-odp] [PATCH v2] Ring implementation of queues

Github ODP bot Thu, 22 Feb 2018 19:10:23 -0800

Bill Fischofer(Bill-Fischofer-Linaro) replied on github web page:

platform/linux-generic/odp_queue.c
line 95
@@ -143,8 +150,10 @@ static int queue_capability(odp_queue_capability_t *capa)
        capa->max_sched_groups  = sched_fn->num_grps();
        capa->sched_prios       = odp_schedule_num_prio();
        capa->plain.max_num     = capa->max_queues;
+       capa->plain.max_size    = CONFIG_QUEUE_SIZE;
        capa->plain.nonblocking = ODP_BLOCKING;
        capa->sched.max_num     = capa->max_queues;
+       capa->sched.max_size    = CONFIG_QUEUE_SIZE;



Comment:
Agreed.

> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> Correct, as noted earlier. I withdraw that comment.


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> Presumably the compiler will see the overlap and optimize away the 
>> redundancy, so I assume the performance impact will be nil here.


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> Since you're allowing `head` and `tail` to run over the entire 32-bit 
>>> range, you're correct that you can completely fill the ring. I did miss 
>>> that point. However, as shown above you still need this to be:
>>> 
>>> ```
>>> num = size - abs(tail - head);
>>> ```
>>> To avoid problems at the 32-bit wrap boundary.


>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>> You're computing in 32 bits not 8 bits, and your ring size is less than 
>>>> 2^32 elements. Consider the following test program:
>>>> ```
>>>> #include <stdlib.h>
>>>> #include <stdio.h>
>>>> #include <stdint.h>
>>>> #include <inttypes.h>
>>>> 
>>>> void main() {
>>>>    uint32_t head[4] = {0, 1, 2, 3};
>>>>    uint32_t tail[4] = {0, 0xffffffff, 0xfffffffe, 0xfffffffd};
>>>>    uint32_t mask = 4095;
>>>>    uint32_t result;
>>>>    int i;
>>>> 
>>>>    for (i = 0; i < 4; i++) {
>>>>            printf("head = %" PRIu32 " tail = %" PRIu32 ":\n",
>>>>                   head[i], tail[i]);
>>>> 
>>>>            result = tail[i] - head[i];
>>>>            printf("tail - head = %" PRIu32 "\n", result);
>>>> 
>>>>            result = (tail[i] - head[i]) & mask;
>>>>            printf("(tail - head) & mask = %" PRIu32 "\n", result);
>>>> 
>>>>            result = abs(tail[i] - head[i]);
>>>>            printf("abs(tail - head) = %" PRIu32 "\n\n", result);
>>>>    }
>>>> }
>>>> ```
>>>> in theory `tail - head` should be the number of elements in the ring, in 
>>>> this case 0, 2, 4, and 6. But running this test program gives the 
>>>> following output:
>>>> ```
>>>> head = 0 tail = 0:
>>>> tail - head = 0
>>>> (tail - head) & mask = 0
>>>> abs(tail - head) = 0
>>>> 
>>>> head = 1 tail = 4294967295:
>>>> tail - head = 4294967294
>>>> (tail - head) & mask = 4094
>>>> abs(tail - head) = 2
>>>> 
>>>> head = 2 tail = 4294967294:
>>>> tail - head = 4294967292
>>>> (tail - head) & mask = 4092
>>>> abs(tail - head) = 4
>>>> 
>>>> head = 3 tail = 4294967293:
>>>> tail - head = 4294967290
>>>> (tail - head) & mask = 4090
>>>> abs(tail - head) = 6
>>>> ```
>>>> Since you're allowing head to run free over the 32-bit range of the 
>>>> variable, when the 32-bits rolls over you'll get a large positive number, 
>>>> not the small one you need to stay within the ring bounds. The alternative 
>>>> is to mask `head` and `tail` as you increment them, but then you run into 
>>>> the effective range issue.


>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>> OK


>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>> OK, the `ODP_ASSERT()` would still be useful for debugging.


>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>> That functionality is not obvious from the name. It either implies that 
>>>>>>> one of the input arguments is written (not true here) or the reader 
>>>>>>> might assume that it is an expression without side-effect and should be 
>>>>>>> deleted (what I originally thought when reading it). You should pick a 
>>>>>>> routine name that makes it clear it's actually doing something real, in 
>>>>>>> this case performing prefetch processing.


>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>> Elsewhere you write `if (ring_st_is_empty(...))`, not `if 
>>>>>>>> (ring_st_is_empty(...) == 1)` so this is inconsistent. 


>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>> Didn't try larger than 32. 32 is already quite large from QoS point 
>>>>>>>>> of view. 
>>>>>>>>> 
>>>>>>>>> I'm planning to use config file for run time tunning, so this hard 
>>>>>>>>> coding may change in that phase.


>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>> One entry is not lost.


>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>> One entry is not lost. User provided size if not (currently) used. 
>>>>>>>>>>> Queue size is always 4k.


>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>> One entry is not lost.


>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>> One entry is not lost.


>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>>> OK, added checks in v2.


>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>>>> OK. Compiler probably did that already, but changed in v2.


>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>>>>> Tail and head indexes are (masked from) uint32_t and do not 
>>>>>>>>>>>>>>>> wrap around when the ring is full. I think you assume that the 
>>>>>>>>>>>>>>>> store index is 0...size-1, while it's full uint32_t which is 
>>>>>>>>>>>>>>>> then masked to get the actual index.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> For example:
>>>>>>>>>>>>>>>> size = 100;
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Empty:
>>>>>>>>>>>>>>>> head = 100
>>>>>>>>>>>>>>>> tail = 100
>>>>>>>>>>>>>>>> num = 100 - 100 = 0
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Full:
>>>>>>>>>>>>>>>> head = 100
>>>>>>>>>>>>>>>> tail = 200
>>>>>>>>>>>>>>>> num = 200 - 100 = 100
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Wrap uint32_t + full:
>>>>>>>>>>>>>>>> head = 0xFFFFFF9C
>>>>>>>>>>>>>>>> tail = 0
>>>>>>>>>>>>>>>> num = 0 - 0xFFFFFF9C = 0x64 = 100
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> So, no abs() needed. Ring size can be 4096, instead of 4095.


>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>>>>>> It's already documented 5 lines above:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> /* Initialize ring. Ring size must be a power of two. */
>>>>>>>>>>>>>>>>> static inline void ring_st_init(ring_st_t *ring, uint32_t 
>>>>>>>>>>>>>>>>> *data, uint32_t size)
>>>>>>>>>>>>>>>>> {


>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>>>>>>> This function converts 32 bit buffer indexes to buffer 
>>>>>>>>>>>>>>>>>> header pointers. The counter operation is 
>>>>>>>>>>>>>>>>>> buffer_index_from_buf(). The prefetch is a side effect of 
>>>>>>>>>>>>>>>>>> the function, which may be changed/moved any time if it's 
>>>>>>>>>>>>>>>>>> found out that there's a place for prefetching. I actually 
>>>>>>>>>>>>>>>>>> plan to test if number of prefetches should be limited as 
>>>>>>>>>>>>>>>>>> e.g. 32 consecutive prefetches may be too much for some CPU 
>>>>>>>>>>>>>>>>>> architectures.


>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>>>>>>>> I prefer style where '== 0' is used instead of '!'. 
>>>>>>>>>>>>>>>>>>> Especially, when the if clause is as complex as this and 
>>>>>>>>>>>>>>>>>>> there's danger for reader to miss the '!' sign. 


>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>>>>>>>>> It's there to ensure that all bits are zero also when 
>>>>>>>>>>>>>>>>>>>> someone would modify the bitfield from two to three fields 
>>>>>>>>>>>>>>>>>>>> later on. Similarly to memset() zero is used for struct 
>>>>>>>>>>>>>>>>>>>> inits.


>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>>>>>>>>>> There's no need for abs(). Since it's all uint32_t 
>>>>>>>>>>>>>>>>>>>>> variables, wrap a round is handled already.
>>>>>>>>>>>>>>>>>>>>> An example in 8bits:
>>>>>>>>>>>>>>>>>>>>> 0xff - 0xfd = 0x02
>>>>>>>>>>>>>>>>>>>>> 0x00 - 0xfe = 0x02
>>>>>>>>>>>>>>>>>>>>> 0x01 - 0xff = 0x02
>>>>>>>>>>>>>>>>>>>>> 0x02 - 0x00 = 0x02
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> This passes both gcc and clang, and is used already in 
>>>>>>>>>>>>>>>>>>>>> the other ring implementation see ring_deq_multi().


>>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>>>>>>>>>>> I prefer style with blank line in the end of a typedef, 
>>>>>>>>>>>>>>>>>>>>>> since it's easier to spot the type name (as it's not 
>>>>>>>>>>>>>>>>>>>>>> mixed into struct field names). Checkpatch passes so 
>>>>>>>>>>>>>>>>>>>>>> this style should be OK.


>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>>>>> Does this mean that sizes larger than 32 have no added 
>>>>>>>>>>>>>>>>>>>>>>> performance benefit?


>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>>>>>> Must use `CONFIG_QUEUE_SIZE - 1` here, as noted 
>>>>>>>>>>>>>>>>>>>>>>>> earlier, if we're not going to use the user-supplied 
>>>>>>>>>>>>>>>>>>>>>>>> queue size.


>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>> Given its name, this looks like an extraneous 
>>>>>>>>>>>>>>>>>>>>>>>>> statement that should be deleted. Renaming this to 
>>>>>>>>>>>>>>>>>>>>>>>>> something like `prefetch_dequeued_bufs()` would make 
>>>>>>>>>>>>>>>>>>>>>>>>> the intent clearer here.


>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>> `if (!ring_st_is_empty(&queue->s.ring_st))` seems 
>>>>>>>>>>>>>>>>>>>>>>>>>> more natural here.


>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>> Change to `if (param->size >= CONFIG_QUEUE_SIZE)` 
>>>>>>>>>>>>>>>>>>>>>>>>>>> to handle the effective queue capacity. The 
>>>>>>>>>>>>>>>>>>>>>>>>>>> user-supplied `size` should then be set to 
>>>>>>>>>>>>>>>>>>>>>>>>>>> `ROUNDUP_POWER2_U32(size) - 1` for the masking to 
>>>>>>>>>>>>>>>>>>>>>>>>>>> work properly.


>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Same comment here as for plain queues.


>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> As noted earlier, due to "losing" one entry to 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> distinguish queue empty/full, this should be 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> returned as `CONFIG_QUEUE_SIZE - 1`, and we also 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> need to ensure that `CONFIG_QUEUE_SIZE` is itself 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> a power of 2.


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Since you're initializing `index.pool` and 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `index.buffer` there's no need to set 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `index.u32` here.


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We originally had this index partitioning based 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> on `ODP_CONFIG_POOLS`. Do we want to return to 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that here?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If not, we at least need an 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `ODP_STATIC_ASSERT()` to ensure that 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `ODP_CONFIG_POOLS < 256` or else bad things 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> will happen here.


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This routine can be optimized to:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> return ring->head == ring->tail;
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ```


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Your invariant is the queue is empty when 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `head == tail` therefore the queue is full 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> when `abs(tail - head) == mask`, so the 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> correct calculation here is:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `num = mask - abs(tail - head);`
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The effect is that a queue can only hold 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `size - 1` elements, otherwise you cannot 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> distinguish between a full and an empty queue 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> without another bit of metadata, which is a 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cost you're trying to avoid.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This is somewhat problematic if the caller is 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> trying to be "optimal" by specifying a power 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of two in the `size` parameter of the 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `odp_queue_param_t` passed to 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `odp_queue_create()`. For this reason we may 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wish to return a `max_size` of a power of 2 - 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1 in `odp_queue_capability()` as part of this 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> patch series.


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This only works if `size` is a power of 2. 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Should be documented as such, since this is 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> an internal routine. In this case an 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `ODP_ASSERT(size == 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ROUNDUP_POWER2_U32(size))` for this 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> requirement would be a useful debugging aid.


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should be `num = abs(tail - head);` to deal 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with wrap arounds, otherwise may be 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> misinterpreted as overly large since it's 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `uint32_t`. Note that GCC and clang 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> recognize `abs()` and treat it as a 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> builtin, so there's no actual `stdlib.h` 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> call here.


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Extra blank line should be removed (nit).


https://github.com/Linaro/odp/pull/492#discussion_r170150984
updated_at 2018-02-23 02:21:54

Re: [lng-odp] [PATCH v2] Ring implementation of queues

Reply via email to