Petri Savolainen(psavol) replied on github web page:

platform/linux-generic/odp_queue.c
@@ -471,51 +476,18 @@ static inline int deq_multi(queue_entry_t *queue, 
odp_buffer_hdr_t *buf_hdr[],
                }
 
                UNLOCK(&queue->s.lock);
-               return 0;
-       }
-
-       for (i = 0; i < num && hdr; ) {
-               int burst_num = hdr->burst_num;
-               int first     = hdr->burst_first;
 
-               /* First, get bursted buffers */
-               for (j = 0; j < burst_num && i < num; j++, i++) {
-                       buf_hdr[i] = hdr->burst[first + j];
-                       odp_prefetch(buf_hdr[i]);
-               }
-
-               if (burst_num) {
-                       hdr->burst_num   = burst_num - j;
-                       hdr->burst_first = first + j;
-               }
-
-               if (i == num)
-                       break;
-
-               /* When burst is empty, consume the current buffer header and
-                * move to the next header */
-               buf_hdr[i] = hdr;
-               next       = hdr->next;
-               hdr->next  = NULL;
-               hdr        = next;
-               updated++;
-               i++;
+               return 0;
        }
 
-       /* Write head only if updated */
-       if (updated)
-               queue->s.head = hdr;
-
-       /* Queue is empty */
-       if (hdr == NULL)
-               queue->s.tail = NULL;
-
        if (status_sync && queue->s.type == ODP_QUEUE_TYPE_SCHED)
                sched_fn->save_context(queue->s.index);
 
        UNLOCK(&queue->s.lock);
 
-       return i;
+       buffer_index_to_buf(buf_hdr, buf_idx, num_deq);


Comment:
This function converts 32 bit buffer indexes to buffer header pointers. The 
counter operation is buffer_index_from_buf(). The prefetch is a side effect of 
the function, which may be changed/moved any time if it's found out that 
there's a place for prefetching. I actually plan to test if number of 
prefetches should be limited as e.g. 32 consecutive prefetches may be too much 
for some CPU architectures.

> Petri Savolainen(psavol) wrote:
> I prefer style where '== 0' is used instead of '!'. Especially, when the if 
> clause is as complex as this and there's danger for reader to miss the '!' 
> sign. 


>> Petri Savolainen(psavol) wrote:
>> It's there to ensure that all bits are zero also when someone would modify 
>> the bitfield from two to three fields later on. Similarly to memset() zero 
>> is used for struct inits.


>>> Petri Savolainen(psavol) wrote:
>>> There's no need for abs(). Since it's all uint32_t variables, wrap a round 
>>> is handled already.
>>> An example in 8bits:
>>> 0xff - 0xfd = 0x02
>>> 0x00 - 0xfe = 0x02
>>> 0x01 - 0xff = 0x02
>>> 0x02 - 0x00 = 0x02
>>> 
>>> This passes both gcc and clang, and is used already in the other ring 
>>> implementation see ring_deq_multi().


>>>> Petri Savolainen(psavol) wrote:
>>>> I prefer style with blank line in the end of a typedef, since it's easier 
>>>> to spot the type name (as it's not mixed into struct field names). 
>>>> Checkpatch passes so this style should be OK.


>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>> Does this mean that sizes larger than 32 have no added performance 
>>>>> benefit?


>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>> Must use `CONFIG_QUEUE_SIZE - 1` here, as noted earlier, if we're not 
>>>>>> going to use the user-supplied queue size.


>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>> Given its name, this looks like an extraneous statement that should be 
>>>>>>> deleted. Renaming this to something like `prefetch_dequeued_bufs()` 
>>>>>>> would make the intent clearer here.


>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>> `if (!ring_st_is_empty(&queue->s.ring_st))` seems more natural here.


>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>> Change to `if (param->size >= CONFIG_QUEUE_SIZE)` to handle the 
>>>>>>>>> effective queue capacity. The user-supplied `size` should then be set 
>>>>>>>>> to `ROUNDUP_POWER2_U32(size) - 1` for the masking to work properly.


>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>> Same comment here as for plain queues.


>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>> As noted earlier, due to "losing" one entry to distinguish queue 
>>>>>>>>>>> empty/full, this should be returned as `CONFIG_QUEUE_SIZE - 1`, and 
>>>>>>>>>>> we also need to ensure that `CONFIG_QUEUE_SIZE` is itself a power 
>>>>>>>>>>> of 2.


>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>> Since you're initializing `index.pool` and `index.buffer` there's 
>>>>>>>>>>>> no need to set `index.u32` here.


>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>> We originally had this index partitioning based on 
>>>>>>>>>>>>> `ODP_CONFIG_POOLS`. Do we want to return to that here?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> If not, we at least need an `ODP_STATIC_ASSERT()` to ensure that 
>>>>>>>>>>>>> `ODP_CONFIG_POOLS < 256` or else bad things will happen here.


>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>> This routine can be optimized to:
>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>> return ring->head == ring->tail;
>>>>>>>>>>>>>> ```


>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>> Your invariant is the queue is empty when `head == tail` 
>>>>>>>>>>>>>>> therefore the queue is full when `abs(tail - head) == mask`, so 
>>>>>>>>>>>>>>> the correct calculation here is:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> `num = mask - abs(tail - head);`
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> The effect is that a queue can only hold `size - 1` elements, 
>>>>>>>>>>>>>>> otherwise you cannot distinguish between a full and an empty 
>>>>>>>>>>>>>>> queue without another bit of metadata, which is a cost you're 
>>>>>>>>>>>>>>> trying to avoid.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> This is somewhat problematic if the caller is trying to be 
>>>>>>>>>>>>>>> "optimal" by specifying a power of two in the `size` parameter 
>>>>>>>>>>>>>>> of the `odp_queue_param_t` passed to `odp_queue_create()`. For 
>>>>>>>>>>>>>>> this reason we may wish to return a `max_size` of a power of 2 
>>>>>>>>>>>>>>> - 1 in `odp_queue_capability()` as part of this patch series.


>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>> This only works if `size` is a power of 2. Should be 
>>>>>>>>>>>>>>>> documented as such, since this is an internal routine. In this 
>>>>>>>>>>>>>>>> case an `ODP_ASSERT(size == ROUNDUP_POWER2_U32(size))` for 
>>>>>>>>>>>>>>>> this requirement would be a useful debugging aid.


>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>> should be `num = abs(tail - head);` to deal with wrap 
>>>>>>>>>>>>>>>>> arounds, otherwise may be misinterpreted as overly large 
>>>>>>>>>>>>>>>>> since it's `uint32_t`. Note that GCC and clang recognize 
>>>>>>>>>>>>>>>>> `abs()` and treat it as a builtin, so there's no actual 
>>>>>>>>>>>>>>>>> `stdlib.h` call here.


>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>> Extra blank line should be removed (nit).


https://github.com/Linaro/odp/pull/492#discussion_r169879602
updated_at 2018-02-22 09:32:40

Reply via email to