Bill Fischofer(Bill-Fischofer-Linaro) replied on github web page:

platform/linux-generic/odp_queue.c
@@ -263,7 +275,7 @@ static int queue_destroy(odp_queue_t handle)
                ODP_ERR("queue \"%s\" already destroyed\n", queue->s.name);
                return -1;
        }
-       if (queue->s.head != NULL) {
+       if (ring_st_is_empty(&queue->s.ring_st) == 0) {


Comment:
Elsewhere you write `if (ring_st_is_empty(...))`, not `if 
(ring_st_is_empty(...) == 1)` so this is inconsistent. 

> Petri Savolainen(psavol) wrote:
> Didn't try larger than 32. 32 is already quite large from QoS point of view. 
> 
> I'm planning to use config file for run time tunning, so this hard coding may 
> change in that phase.


>> Petri Savolainen(psavol) wrote:
>> One entry is not lost.


>>> Petri Savolainen(psavol) wrote:
>>> One entry is not lost. User provided size if not (currently) used. Queue 
>>> size is always 4k.


>>>> Petri Savolainen(psavol) wrote:
>>>> One entry is not lost.


>>>>> Petri Savolainen(psavol) wrote:
>>>>> One entry is not lost.


>>>>>> Petri Savolainen(psavol) wrote:
>>>>>> OK, added checks in v2.


>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>> OK. Compiler probably did that already, but changed in v2.


>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>> Tail and head indexes are (masked from) uint32_t and do not wrap 
>>>>>>>> around when the ring is full. I think you assume that the store index 
>>>>>>>> is 0...size-1, while it's full uint32_t which is then masked to get 
>>>>>>>> the actual index.
>>>>>>>> 
>>>>>>>> For example:
>>>>>>>> size = 100;
>>>>>>>> 
>>>>>>>> Empty:
>>>>>>>> head = 100
>>>>>>>> tail = 100
>>>>>>>> num = 100 - 100 = 0
>>>>>>>> 
>>>>>>>> Full:
>>>>>>>> head = 100
>>>>>>>> tail = 200
>>>>>>>> num = 200 - 100 = 100
>>>>>>>> 
>>>>>>>> Wrap uint32_t + full:
>>>>>>>> head = 0xFFFFFF9C
>>>>>>>> tail = 0
>>>>>>>> num = 0 - 0xFFFFFF9C = 0x64 = 100
>>>>>>>> 
>>>>>>>> So, no abs() needed. Ring size can be 4096, instead of 4095.


>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>> It's already documented 5 lines above:
>>>>>>>>> 
>>>>>>>>> /* Initialize ring. Ring size must be a power of two. */
>>>>>>>>> static inline void ring_st_init(ring_st_t *ring, uint32_t *data, 
>>>>>>>>> uint32_t size)
>>>>>>>>> {


>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>> This function converts 32 bit buffer indexes to buffer header 
>>>>>>>>>> pointers. The counter operation is buffer_index_from_buf(). The 
>>>>>>>>>> prefetch is a side effect of the function, which may be 
>>>>>>>>>> changed/moved any time if it's found out that there's a place for 
>>>>>>>>>> prefetching. I actually plan to test if number of prefetches should 
>>>>>>>>>> be limited as e.g. 32 consecutive prefetches may be too much for 
>>>>>>>>>> some CPU architectures.


>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>> I prefer style where '== 0' is used instead of '!'. Especially, 
>>>>>>>>>>> when the if clause is as complex as this and there's danger for 
>>>>>>>>>>> reader to miss the '!' sign. 


>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>> It's there to ensure that all bits are zero also when someone 
>>>>>>>>>>>> would modify the bitfield from two to three fields later on. 
>>>>>>>>>>>> Similarly to memset() zero is used for struct inits.


>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>> There's no need for abs(). Since it's all uint32_t variables, 
>>>>>>>>>>>>> wrap a round is handled already.
>>>>>>>>>>>>> An example in 8bits:
>>>>>>>>>>>>> 0xff - 0xfd = 0x02
>>>>>>>>>>>>> 0x00 - 0xfe = 0x02
>>>>>>>>>>>>> 0x01 - 0xff = 0x02
>>>>>>>>>>>>> 0x02 - 0x00 = 0x02
>>>>>>>>>>>>> 
>>>>>>>>>>>>> This passes both gcc and clang, and is used already in the other 
>>>>>>>>>>>>> ring implementation see ring_deq_multi().


>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>>> I prefer style with blank line in the end of a typedef, since 
>>>>>>>>>>>>>> it's easier to spot the type name (as it's not mixed into struct 
>>>>>>>>>>>>>> field names). Checkpatch passes so this style should be OK.


>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>> Does this mean that sizes larger than 32 have no added 
>>>>>>>>>>>>>>> performance benefit?


>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>> Must use `CONFIG_QUEUE_SIZE - 1` here, as noted earlier, if 
>>>>>>>>>>>>>>>> we're not going to use the user-supplied queue size.


>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>> Given its name, this looks like an extraneous statement that 
>>>>>>>>>>>>>>>>> should be deleted. Renaming this to something like 
>>>>>>>>>>>>>>>>> `prefetch_dequeued_bufs()` would make the intent clearer here.


>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>> `if (!ring_st_is_empty(&queue->s.ring_st))` seems more 
>>>>>>>>>>>>>>>>>> natural here.


>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>> Change to `if (param->size >= CONFIG_QUEUE_SIZE)` to handle 
>>>>>>>>>>>>>>>>>>> the effective queue capacity. The user-supplied `size` 
>>>>>>>>>>>>>>>>>>> should then be set to `ROUNDUP_POWER2_U32(size) - 1` for 
>>>>>>>>>>>>>>>>>>> the masking to work properly.


>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>> Same comment here as for plain queues.


>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>>> As noted earlier, due to "losing" one entry to 
>>>>>>>>>>>>>>>>>>>>> distinguish queue empty/full, this should be returned as 
>>>>>>>>>>>>>>>>>>>>> `CONFIG_QUEUE_SIZE - 1`, and we also need to ensure that 
>>>>>>>>>>>>>>>>>>>>> `CONFIG_QUEUE_SIZE` is itself a power of 2.


>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>>>> Since you're initializing `index.pool` and 
>>>>>>>>>>>>>>>>>>>>>> `index.buffer` there's no need to set `index.u32` here.


>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>>>>> We originally had this index partitioning based on 
>>>>>>>>>>>>>>>>>>>>>>> `ODP_CONFIG_POOLS`. Do we want to return to that here?
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> If not, we at least need an `ODP_STATIC_ASSERT()` to 
>>>>>>>>>>>>>>>>>>>>>>> ensure that `ODP_CONFIG_POOLS < 256` or else bad things 
>>>>>>>>>>>>>>>>>>>>>>> will happen here.


>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>>>>>> This routine can be optimized to:
>>>>>>>>>>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>>>>>>>>>> return ring->head == ring->tail;
>>>>>>>>>>>>>>>>>>>>>>>> ```


>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>> Your invariant is the queue is empty when `head == 
>>>>>>>>>>>>>>>>>>>>>>>>> tail` therefore the queue is full when `abs(tail - 
>>>>>>>>>>>>>>>>>>>>>>>>> head) == mask`, so the correct calculation here is:
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> `num = mask - abs(tail - head);`
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> The effect is that a queue can only hold `size - 1` 
>>>>>>>>>>>>>>>>>>>>>>>>> elements, otherwise you cannot distinguish between a 
>>>>>>>>>>>>>>>>>>>>>>>>> full and an empty queue without another bit of 
>>>>>>>>>>>>>>>>>>>>>>>>> metadata, which is a cost you're trying to avoid.
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> This is somewhat problematic if the caller is trying 
>>>>>>>>>>>>>>>>>>>>>>>>> to be "optimal" by specifying a power of two in the 
>>>>>>>>>>>>>>>>>>>>>>>>> `size` parameter of the `odp_queue_param_t` passed to 
>>>>>>>>>>>>>>>>>>>>>>>>> `odp_queue_create()`. For this reason we may wish to 
>>>>>>>>>>>>>>>>>>>>>>>>> return a `max_size` of a power of 2 - 1 in 
>>>>>>>>>>>>>>>>>>>>>>>>> `odp_queue_capability()` as part of this patch series.


>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>> This only works if `size` is a power of 2. Should be 
>>>>>>>>>>>>>>>>>>>>>>>>>> documented as such, since this is an internal 
>>>>>>>>>>>>>>>>>>>>>>>>>> routine. In this case an `ODP_ASSERT(size == 
>>>>>>>>>>>>>>>>>>>>>>>>>> ROUNDUP_POWER2_U32(size))` for this requirement 
>>>>>>>>>>>>>>>>>>>>>>>>>> would be a useful debugging aid.


>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>> should be `num = abs(tail - head);` to deal with 
>>>>>>>>>>>>>>>>>>>>>>>>>>> wrap arounds, otherwise may be misinterpreted as 
>>>>>>>>>>>>>>>>>>>>>>>>>>> overly large since it's `uint32_t`. Note that GCC 
>>>>>>>>>>>>>>>>>>>>>>>>>>> and clang recognize `abs()` and treat it as a 
>>>>>>>>>>>>>>>>>>>>>>>>>>> builtin, so there's no actual `stdlib.h` call here.


>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Extra blank line should be removed (nit).


https://github.com/Linaro/odp/pull/492#discussion_r169950516
updated_at 2018-02-22 13:14:23

Reply via email to