Petri Savolainen(psavol) replied on github web page:

platform/linux-generic/include/odp_buffer_internal.h
line 17
@@ -41,11 +41,19 @@ typedef struct seg_entry_t {
        uint32_t  len;
 } seg_entry_t;
 
+typedef union buffer_index_t {
+       uint32_t u32;
+
+       struct {
+               uint32_t pool   :8;
+               uint32_t buffer :24;


Comment:
OK, added checks in v2.

> Petri Savolainen(psavol) wrote:
> OK. Compiler probably did that already, but changed in v2.


>> Petri Savolainen(psavol) wrote:
>> Tail and head indexes are (masked from) uint32_t and do not wrap around when 
>> the ring is full. I think you assume that the store index is 0...size-1, 
>> while it's full uint32_t which is then masked to get the actual index.
>> 
>> For example:
>> size = 100;
>> 
>> Empty:
>> head = 100
>> tail = 100
>> num = 100 - 100 = 0
>> 
>> Full:
>> head = 100
>> tail = 200
>> num = 200 - 100 = 100
>> 
>> Wrap uint32_t + full:
>> head = 0xFFFFFF9C
>> tail = 0
>> num = 0 - 0xFFFFFF9C = 0x64 = 100
>> 
>> So, no abs() needed. Ring size can be 4096, instead of 4095.


>>> Petri Savolainen(psavol) wrote:
>>> It's already documented 5 lines above:
>>> 
>>> /* Initialize ring. Ring size must be a power of two. */
>>> static inline void ring_st_init(ring_st_t *ring, uint32_t *data, uint32_t 
>>> size)
>>> {


>>>> Petri Savolainen(psavol) wrote:
>>>> This function converts 32 bit buffer indexes to buffer header pointers. 
>>>> The counter operation is buffer_index_from_buf(). The prefetch is a side 
>>>> effect of the function, which may be changed/moved any time if it's found 
>>>> out that there's a place for prefetching. I actually plan to test if 
>>>> number of prefetches should be limited as e.g. 32 consecutive prefetches 
>>>> may be too much for some CPU architectures.


>>>>> Petri Savolainen(psavol) wrote:
>>>>> I prefer style where '== 0' is used instead of '!'. Especially, when the 
>>>>> if clause is as complex as this and there's danger for reader to miss the 
>>>>> '!' sign. 


>>>>>> Petri Savolainen(psavol) wrote:
>>>>>> It's there to ensure that all bits are zero also when someone would 
>>>>>> modify the bitfield from two to three fields later on. Similarly to 
>>>>>> memset() zero is used for struct inits.


>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>> There's no need for abs(). Since it's all uint32_t variables, wrap a 
>>>>>>> round is handled already.
>>>>>>> An example in 8bits:
>>>>>>> 0xff - 0xfd = 0x02
>>>>>>> 0x00 - 0xfe = 0x02
>>>>>>> 0x01 - 0xff = 0x02
>>>>>>> 0x02 - 0x00 = 0x02
>>>>>>> 
>>>>>>> This passes both gcc and clang, and is used already in the other ring 
>>>>>>> implementation see ring_deq_multi().


>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>> I prefer style with blank line in the end of a typedef, since it's 
>>>>>>>> easier to spot the type name (as it's not mixed into struct field 
>>>>>>>> names). Checkpatch passes so this style should be OK.


>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>> Does this mean that sizes larger than 32 have no added performance 
>>>>>>>>> benefit?


>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>> Must use `CONFIG_QUEUE_SIZE - 1` here, as noted earlier, if we're 
>>>>>>>>>> not going to use the user-supplied queue size.


>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>> Given its name, this looks like an extraneous statement that should 
>>>>>>>>>>> be deleted. Renaming this to something like 
>>>>>>>>>>> `prefetch_dequeued_bufs()` would make the intent clearer here.


>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>> `if (!ring_st_is_empty(&queue->s.ring_st))` seems more natural 
>>>>>>>>>>>> here.


>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>> Change to `if (param->size >= CONFIG_QUEUE_SIZE)` to handle the 
>>>>>>>>>>>>> effective queue capacity. The user-supplied `size` should then be 
>>>>>>>>>>>>> set to `ROUNDUP_POWER2_U32(size) - 1` for the masking to work 
>>>>>>>>>>>>> properly.


>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>> Same comment here as for plain queues.


>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>> As noted earlier, due to "losing" one entry to distinguish 
>>>>>>>>>>>>>>> queue empty/full, this should be returned as `CONFIG_QUEUE_SIZE 
>>>>>>>>>>>>>>> - 1`, and we also need to ensure that `CONFIG_QUEUE_SIZE` is 
>>>>>>>>>>>>>>> itself a power of 2.


>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>> Since you're initializing `index.pool` and `index.buffer` 
>>>>>>>>>>>>>>>> there's no need to set `index.u32` here.


>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>> We originally had this index partitioning based on 
>>>>>>>>>>>>>>>>> `ODP_CONFIG_POOLS`. Do we want to return to that here?
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> If not, we at least need an `ODP_STATIC_ASSERT()` to ensure 
>>>>>>>>>>>>>>>>> that `ODP_CONFIG_POOLS < 256` or else bad things will happen 
>>>>>>>>>>>>>>>>> here.


>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>> This routine can be optimized to:
>>>>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>>>> return ring->head == ring->tail;
>>>>>>>>>>>>>>>>>> ```


>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>> Your invariant is the queue is empty when `head == tail` 
>>>>>>>>>>>>>>>>>>> therefore the queue is full when `abs(tail - head) == 
>>>>>>>>>>>>>>>>>>> mask`, so the correct calculation here is:
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> `num = mask - abs(tail - head);`
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> The effect is that a queue can only hold `size - 1` 
>>>>>>>>>>>>>>>>>>> elements, otherwise you cannot distinguish between a full 
>>>>>>>>>>>>>>>>>>> and an empty queue without another bit of metadata, which 
>>>>>>>>>>>>>>>>>>> is a cost you're trying to avoid.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> This is somewhat problematic if the caller is trying to be 
>>>>>>>>>>>>>>>>>>> "optimal" by specifying a power of two in the `size` 
>>>>>>>>>>>>>>>>>>> parameter of the `odp_queue_param_t` passed to 
>>>>>>>>>>>>>>>>>>> `odp_queue_create()`. For this reason we may wish to return 
>>>>>>>>>>>>>>>>>>> a `max_size` of a power of 2 - 1 in 
>>>>>>>>>>>>>>>>>>> `odp_queue_capability()` as part of this patch series.


>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>> This only works if `size` is a power of 2. Should be 
>>>>>>>>>>>>>>>>>>>> documented as such, since this is an internal routine. In 
>>>>>>>>>>>>>>>>>>>> this case an `ODP_ASSERT(size == 
>>>>>>>>>>>>>>>>>>>> ROUNDUP_POWER2_U32(size))` for this requirement would be a 
>>>>>>>>>>>>>>>>>>>> useful debugging aid.


>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>>> should be `num = abs(tail - head);` to deal with wrap 
>>>>>>>>>>>>>>>>>>>>> arounds, otherwise may be misinterpreted as overly large 
>>>>>>>>>>>>>>>>>>>>> since it's `uint32_t`. Note that GCC and clang recognize 
>>>>>>>>>>>>>>>>>>>>> `abs()` and treat it as a builtin, so there's no actual 
>>>>>>>>>>>>>>>>>>>>> `stdlib.h` call here.


>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>>>> Extra blank line should be removed (nit).


https://github.com/Linaro/odp/pull/492#discussion_r169891070
updated_at 2018-02-22 09:32:40

Reply via email to