Bill Fischofer(Bill-Fischofer-Linaro) replied on github web page: platform/linux-generic/include/odp_ring_st_internal.h line 24 @@ -0,0 +1,118 @@ +/* Copyright (c) 2018, Linaro Limited + * All rights reserved. + * + * SPDX-License-Identifier: BSD-3-Clause + */ + +#ifndef ODP_RING_ST_INTERNAL_H_ +#define ODP_RING_ST_INTERNAL_H_ + +#ifdef __cplusplus +extern "C" { +#endif + +#include <odp/api/hints.h> +#include <odp_align_internal.h> + +/* Basic ring for single thread usage. Operations must be synchronized by using + * locks (or other means), when multiple threads use the same ring. */ +typedef struct { + uint32_t head; + uint32_t tail; + uint32_t mask; + uint32_t *data; +
Comment: OK > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > OK, the `ODP_ASSERT()` would still be useful for debugging. >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> That functionality is not obvious from the name. It either implies that one >> of the input arguments is written (not true here) or the reader might assume >> that it is an expression without side-effect and should be deleted (what I >> originally thought when reading it). You should pick a routine name that >> makes it clear it's actually doing something real, in this case performing >> prefetch processing. >>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>> Elsewhere you write `if (ring_st_is_empty(...))`, not `if >>> (ring_st_is_empty(...) == 1)` so this is inconsistent. >>>> Petri Savolainen(psavol) wrote: >>>> Didn't try larger than 32. 32 is already quite large from QoS point of >>>> view. >>>> >>>> I'm planning to use config file for run time tunning, so this hard coding >>>> may change in that phase. >>>>> Petri Savolainen(psavol) wrote: >>>>> One entry is not lost. >>>>>> Petri Savolainen(psavol) wrote: >>>>>> One entry is not lost. User provided size if not (currently) used. Queue >>>>>> size is always 4k. >>>>>>> Petri Savolainen(psavol) wrote: >>>>>>> One entry is not lost. >>>>>>>> Petri Savolainen(psavol) wrote: >>>>>>>> One entry is not lost. >>>>>>>>> Petri Savolainen(psavol) wrote: >>>>>>>>> OK, added checks in v2. >>>>>>>>>> Petri Savolainen(psavol) wrote: >>>>>>>>>> OK. Compiler probably did that already, but changed in v2. >>>>>>>>>>> Petri Savolainen(psavol) wrote: >>>>>>>>>>> Tail and head indexes are (masked from) uint32_t and do not wrap >>>>>>>>>>> around when the ring is full. I think you assume that the store >>>>>>>>>>> index is 0...size-1, while it's full uint32_t which is then masked >>>>>>>>>>> to get the actual index. >>>>>>>>>>> >>>>>>>>>>> For example: >>>>>>>>>>> size = 100; >>>>>>>>>>> >>>>>>>>>>> Empty: >>>>>>>>>>> head = 100 >>>>>>>>>>> tail = 100 >>>>>>>>>>> num = 100 - 100 = 0 >>>>>>>>>>> >>>>>>>>>>> Full: >>>>>>>>>>> head = 100 >>>>>>>>>>> tail = 200 >>>>>>>>>>> num = 200 - 100 = 100 >>>>>>>>>>> >>>>>>>>>>> Wrap uint32_t + full: >>>>>>>>>>> head = 0xFFFFFF9C >>>>>>>>>>> tail = 0 >>>>>>>>>>> num = 0 - 0xFFFFFF9C = 0x64 = 100 >>>>>>>>>>> >>>>>>>>>>> So, no abs() needed. Ring size can be 4096, instead of 4095. >>>>>>>>>>>> Petri Savolainen(psavol) wrote: >>>>>>>>>>>> It's already documented 5 lines above: >>>>>>>>>>>> >>>>>>>>>>>> /* Initialize ring. Ring size must be a power of two. */ >>>>>>>>>>>> static inline void ring_st_init(ring_st_t *ring, uint32_t *data, >>>>>>>>>>>> uint32_t size) >>>>>>>>>>>> { >>>>>>>>>>>>> Petri Savolainen(psavol) wrote: >>>>>>>>>>>>> This function converts 32 bit buffer indexes to buffer header >>>>>>>>>>>>> pointers. The counter operation is buffer_index_from_buf(). The >>>>>>>>>>>>> prefetch is a side effect of the function, which may be >>>>>>>>>>>>> changed/moved any time if it's found out that there's a place for >>>>>>>>>>>>> prefetching. I actually plan to test if number of prefetches >>>>>>>>>>>>> should be limited as e.g. 32 consecutive prefetches may be too >>>>>>>>>>>>> much for some CPU architectures. >>>>>>>>>>>>>> Petri Savolainen(psavol) wrote: >>>>>>>>>>>>>> I prefer style where '== 0' is used instead of '!'. Especially, >>>>>>>>>>>>>> when the if clause is as complex as this and there's danger for >>>>>>>>>>>>>> reader to miss the '!' sign. >>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote: >>>>>>>>>>>>>>> It's there to ensure that all bits are zero also when someone >>>>>>>>>>>>>>> would modify the bitfield from two to three fields later on. >>>>>>>>>>>>>>> Similarly to memset() zero is used for struct inits. >>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote: >>>>>>>>>>>>>>>> There's no need for abs(). Since it's all uint32_t variables, >>>>>>>>>>>>>>>> wrap a round is handled already. >>>>>>>>>>>>>>>> An example in 8bits: >>>>>>>>>>>>>>>> 0xff - 0xfd = 0x02 >>>>>>>>>>>>>>>> 0x00 - 0xfe = 0x02 >>>>>>>>>>>>>>>> 0x01 - 0xff = 0x02 >>>>>>>>>>>>>>>> 0x02 - 0x00 = 0x02 >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> This passes both gcc and clang, and is used already in the >>>>>>>>>>>>>>>> other ring implementation see ring_deq_multi(). >>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote: >>>>>>>>>>>>>>>>> I prefer style with blank line in the end of a typedef, since >>>>>>>>>>>>>>>>> it's easier to spot the type name (as it's not mixed into >>>>>>>>>>>>>>>>> struct field names). Checkpatch passes so this style should >>>>>>>>>>>>>>>>> be OK. >>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>>>>>>> Does this mean that sizes larger than 32 have no added >>>>>>>>>>>>>>>>>> performance benefit? >>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>>>>>>>> Must use `CONFIG_QUEUE_SIZE - 1` here, as noted earlier, if >>>>>>>>>>>>>>>>>>> we're not going to use the user-supplied queue size. >>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>>>>>>>>> Given its name, this looks like an extraneous statement >>>>>>>>>>>>>>>>>>>> that should be deleted. Renaming this to something like >>>>>>>>>>>>>>>>>>>> `prefetch_dequeued_bufs()` would make the intent clearer >>>>>>>>>>>>>>>>>>>> here. >>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>>>>>>>>>> `if (!ring_st_is_empty(&queue->s.ring_st))` seems more >>>>>>>>>>>>>>>>>>>>> natural here. >>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>>>>>>>>>>> Change to `if (param->size >= CONFIG_QUEUE_SIZE)` to >>>>>>>>>>>>>>>>>>>>>> handle the effective queue capacity. The user-supplied >>>>>>>>>>>>>>>>>>>>>> `size` should then be set to `ROUNDUP_POWER2_U32(size) - >>>>>>>>>>>>>>>>>>>>>> 1` for the masking to work properly. >>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>>>>>>>>>>>> Same comment here as for plain queues. >>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>>>>>>>>>>>>> As noted earlier, due to "losing" one entry to >>>>>>>>>>>>>>>>>>>>>>>> distinguish queue empty/full, this should be returned >>>>>>>>>>>>>>>>>>>>>>>> as `CONFIG_QUEUE_SIZE - 1`, and we also need to ensure >>>>>>>>>>>>>>>>>>>>>>>> that `CONFIG_QUEUE_SIZE` is itself a power of 2. >>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>>>>>>>>>>>>>> Since you're initializing `index.pool` and >>>>>>>>>>>>>>>>>>>>>>>>> `index.buffer` there's no need to set `index.u32` >>>>>>>>>>>>>>>>>>>>>>>>> here. >>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>>>>>>>>>>>>>>> We originally had this index partitioning based on >>>>>>>>>>>>>>>>>>>>>>>>>> `ODP_CONFIG_POOLS`. Do we want to return to that >>>>>>>>>>>>>>>>>>>>>>>>>> here? >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> If not, we at least need an `ODP_STATIC_ASSERT()` to >>>>>>>>>>>>>>>>>>>>>>>>>> ensure that `ODP_CONFIG_POOLS < 256` or else bad >>>>>>>>>>>>>>>>>>>>>>>>>> things will happen here. >>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>> This routine can be optimized to: >>>>>>>>>>>>>>>>>>>>>>>>>>> ``` >>>>>>>>>>>>>>>>>>>>>>>>>>> return ring->head == ring->tail; >>>>>>>>>>>>>>>>>>>>>>>>>>> ``` >>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>> Your invariant is the queue is empty when `head == >>>>>>>>>>>>>>>>>>>>>>>>>>>> tail` therefore the queue is full when `abs(tail - >>>>>>>>>>>>>>>>>>>>>>>>>>>> head) == mask`, so the correct calculation here is: >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> `num = mask - abs(tail - head);` >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> The effect is that a queue can only hold `size - >>>>>>>>>>>>>>>>>>>>>>>>>>>> 1` elements, otherwise you cannot distinguish >>>>>>>>>>>>>>>>>>>>>>>>>>>> between a full and an empty queue without another >>>>>>>>>>>>>>>>>>>>>>>>>>>> bit of metadata, which is a cost you're trying to >>>>>>>>>>>>>>>>>>>>>>>>>>>> avoid. >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> This is somewhat problematic if the caller is >>>>>>>>>>>>>>>>>>>>>>>>>>>> trying to be "optimal" by specifying a power of >>>>>>>>>>>>>>>>>>>>>>>>>>>> two in the `size` parameter of the >>>>>>>>>>>>>>>>>>>>>>>>>>>> `odp_queue_param_t` passed to >>>>>>>>>>>>>>>>>>>>>>>>>>>> `odp_queue_create()`. For this reason we may wish >>>>>>>>>>>>>>>>>>>>>>>>>>>> to return a `max_size` of a power of 2 - 1 in >>>>>>>>>>>>>>>>>>>>>>>>>>>> `odp_queue_capability()` as part of this patch >>>>>>>>>>>>>>>>>>>>>>>>>>>> series. >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>> This only works if `size` is a power of 2. Should >>>>>>>>>>>>>>>>>>>>>>>>>>>>> be documented as such, since this is an internal >>>>>>>>>>>>>>>>>>>>>>>>>>>>> routine. In this case an `ODP_ASSERT(size == >>>>>>>>>>>>>>>>>>>>>>>>>>>>> ROUNDUP_POWER2_U32(size))` for this requirement >>>>>>>>>>>>>>>>>>>>>>>>>>>>> would be a useful debugging aid. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should be `num = abs(tail - head);` to deal with >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrap arounds, otherwise may be misinterpreted as >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> overly large since it's `uint32_t`. Note that >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> GCC and clang recognize `abs()` and treat it as >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> a builtin, so there's no actual `stdlib.h` call >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> here. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Extra blank line should be removed (nit). https://github.com/Linaro/odp/pull/492#discussion_r169951888 updated_at 2018-02-22 13:22:36