Petri Savolainen(psavol) replied on github web page:

platform/linux-generic/odp_queue.c
line 420
@@ -584,8 +556,9 @@ static int queue_init(queue_entry_t *queue, const char 
*name,
        queue->s.pktin = PKTIN_INVALID;
        queue->s.pktout = PKTOUT_INVALID;
 
-       queue->s.head = NULL;
-       queue->s.tail = NULL;
+       ring_st_init(&queue->s.ring_st,
+                    queue_tbl->ring_data[queue->s.index].data,
+                    CONFIG_QUEUE_SIZE);


Comment:
One entry is not lost.

> Petri Savolainen(psavol) wrote:
> One entry is not lost. User provided size if not (currently) used. Queue size 
> is always 4k.


>> Petri Savolainen(psavol) wrote:
>> One entry is not lost.


>>> Petri Savolainen(psavol) wrote:
>>> One entry is not lost.


>>>> Petri Savolainen(psavol) wrote:
>>>> OK, added checks in v2.


>>>>> Petri Savolainen(psavol) wrote:
>>>>> OK. Compiler probably did that already, but changed in v2.


>>>>>> Petri Savolainen(psavol) wrote:
>>>>>> Tail and head indexes are (masked from) uint32_t and do not wrap around 
>>>>>> when the ring is full. I think you assume that the store index is 
>>>>>> 0...size-1, while it's full uint32_t which is then masked to get the 
>>>>>> actual index.
>>>>>> 
>>>>>> For example:
>>>>>> size = 100;
>>>>>> 
>>>>>> Empty:
>>>>>> head = 100
>>>>>> tail = 100
>>>>>> num = 100 - 100 = 0
>>>>>> 
>>>>>> Full:
>>>>>> head = 100
>>>>>> tail = 200
>>>>>> num = 200 - 100 = 100
>>>>>> 
>>>>>> Wrap uint32_t + full:
>>>>>> head = 0xFFFFFF9C
>>>>>> tail = 0
>>>>>> num = 0 - 0xFFFFFF9C = 0x64 = 100
>>>>>> 
>>>>>> So, no abs() needed. Ring size can be 4096, instead of 4095.


>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>> It's already documented 5 lines above:
>>>>>>> 
>>>>>>> /* Initialize ring. Ring size must be a power of two. */
>>>>>>> static inline void ring_st_init(ring_st_t *ring, uint32_t *data, 
>>>>>>> uint32_t size)
>>>>>>> {


>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>> This function converts 32 bit buffer indexes to buffer header 
>>>>>>>> pointers. The counter operation is buffer_index_from_buf(). The 
>>>>>>>> prefetch is a side effect of the function, which may be changed/moved 
>>>>>>>> any time if it's found out that there's a place for prefetching. I 
>>>>>>>> actually plan to test if number of prefetches should be limited as 
>>>>>>>> e.g. 32 consecutive prefetches may be too much for some CPU 
>>>>>>>> architectures.


>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>> I prefer style where '== 0' is used instead of '!'. Especially, when 
>>>>>>>>> the if clause is as complex as this and there's danger for reader to 
>>>>>>>>> miss the '!' sign. 


>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>> It's there to ensure that all bits are zero also when someone would 
>>>>>>>>>> modify the bitfield from two to three fields later on. Similarly to 
>>>>>>>>>> memset() zero is used for struct inits.


>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>> There's no need for abs(). Since it's all uint32_t variables, wrap 
>>>>>>>>>>> a round is handled already.
>>>>>>>>>>> An example in 8bits:
>>>>>>>>>>> 0xff - 0xfd = 0x02
>>>>>>>>>>> 0x00 - 0xfe = 0x02
>>>>>>>>>>> 0x01 - 0xff = 0x02
>>>>>>>>>>> 0x02 - 0x00 = 0x02
>>>>>>>>>>> 
>>>>>>>>>>> This passes both gcc and clang, and is used already in the other 
>>>>>>>>>>> ring implementation see ring_deq_multi().


>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>> I prefer style with blank line in the end of a typedef, since it's 
>>>>>>>>>>>> easier to spot the type name (as it's not mixed into struct field 
>>>>>>>>>>>> names). Checkpatch passes so this style should be OK.


>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>> Does this mean that sizes larger than 32 have no added 
>>>>>>>>>>>>> performance benefit?


>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>> Must use `CONFIG_QUEUE_SIZE - 1` here, as noted earlier, if 
>>>>>>>>>>>>>> we're not going to use the user-supplied queue size.


>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>> Given its name, this looks like an extraneous statement that 
>>>>>>>>>>>>>>> should be deleted. Renaming this to something like 
>>>>>>>>>>>>>>> `prefetch_dequeued_bufs()` would make the intent clearer here.


>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>> `if (!ring_st_is_empty(&queue->s.ring_st))` seems more natural 
>>>>>>>>>>>>>>>> here.


>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>> Change to `if (param->size >= CONFIG_QUEUE_SIZE)` to handle 
>>>>>>>>>>>>>>>>> the effective queue capacity. The user-supplied `size` should 
>>>>>>>>>>>>>>>>> then be set to `ROUNDUP_POWER2_U32(size) - 1` for the masking 
>>>>>>>>>>>>>>>>> to work properly.


>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>> Same comment here as for plain queues.


>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>> As noted earlier, due to "losing" one entry to distinguish 
>>>>>>>>>>>>>>>>>>> queue empty/full, this should be returned as 
>>>>>>>>>>>>>>>>>>> `CONFIG_QUEUE_SIZE - 1`, and we also need to ensure that 
>>>>>>>>>>>>>>>>>>> `CONFIG_QUEUE_SIZE` is itself a power of 2.


>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>> Since you're initializing `index.pool` and `index.buffer` 
>>>>>>>>>>>>>>>>>>>> there's no need to set `index.u32` here.


>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>>> We originally had this index partitioning based on 
>>>>>>>>>>>>>>>>>>>>> `ODP_CONFIG_POOLS`. Do we want to return to that here?
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> If not, we at least need an `ODP_STATIC_ASSERT()` to 
>>>>>>>>>>>>>>>>>>>>> ensure that `ODP_CONFIG_POOLS < 256` or else bad things 
>>>>>>>>>>>>>>>>>>>>> will happen here.


>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>>>> This routine can be optimized to:
>>>>>>>>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>>>>>>>> return ring->head == ring->tail;
>>>>>>>>>>>>>>>>>>>>>> ```


>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>>>>> Your invariant is the queue is empty when `head == 
>>>>>>>>>>>>>>>>>>>>>>> tail` therefore the queue is full when `abs(tail - 
>>>>>>>>>>>>>>>>>>>>>>> head) == mask`, so the correct calculation here is:
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> `num = mask - abs(tail - head);`
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> The effect is that a queue can only hold `size - 1` 
>>>>>>>>>>>>>>>>>>>>>>> elements, otherwise you cannot distinguish between a 
>>>>>>>>>>>>>>>>>>>>>>> full and an empty queue without another bit of 
>>>>>>>>>>>>>>>>>>>>>>> metadata, which is a cost you're trying to avoid.
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> This is somewhat problematic if the caller is trying to 
>>>>>>>>>>>>>>>>>>>>>>> be "optimal" by specifying a power of two in the `size` 
>>>>>>>>>>>>>>>>>>>>>>> parameter of the `odp_queue_param_t` passed to 
>>>>>>>>>>>>>>>>>>>>>>> `odp_queue_create()`. For this reason we may wish to 
>>>>>>>>>>>>>>>>>>>>>>> return a `max_size` of a power of 2 - 1 in 
>>>>>>>>>>>>>>>>>>>>>>> `odp_queue_capability()` as part of this patch series.


>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>>>>>> This only works if `size` is a power of 2. Should be 
>>>>>>>>>>>>>>>>>>>>>>>> documented as such, since this is an internal routine. 
>>>>>>>>>>>>>>>>>>>>>>>> In this case an `ODP_ASSERT(size == 
>>>>>>>>>>>>>>>>>>>>>>>> ROUNDUP_POWER2_U32(size))` for this requirement would 
>>>>>>>>>>>>>>>>>>>>>>>> be a useful debugging aid.


>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>> should be `num = abs(tail - head);` to deal with wrap 
>>>>>>>>>>>>>>>>>>>>>>>>> arounds, otherwise may be misinterpreted as overly 
>>>>>>>>>>>>>>>>>>>>>>>>> large since it's `uint32_t`. Note that GCC and clang 
>>>>>>>>>>>>>>>>>>>>>>>>> recognize `abs()` and treat it as a builtin, so 
>>>>>>>>>>>>>>>>>>>>>>>>> there's no actual `stdlib.h` call here.


>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>> Extra blank line should be removed (nit).


https://github.com/Linaro/odp/pull/492#discussion_r169892348
updated_at 2018-02-22 09:32:40

Reply via email to