Petri Savolainen(psavol) replied on github web page: platform/linux-generic/odp_queue.c line 103 @@ -192,6 +201,9 @@ static odp_queue_t queue_create(const char *name, param = &default_param; } + if (param->size > CONFIG_QUEUE_SIZE)
Comment: One entry is not lost. User provided size if not (currently) used. Queue size is always 4k. > Petri Savolainen(psavol) wrote: > One entry is not lost. >> Petri Savolainen(psavol) wrote: >> One entry is not lost. >>> Petri Savolainen(psavol) wrote: >>> OK, added checks in v2. >>>> Petri Savolainen(psavol) wrote: >>>> OK. Compiler probably did that already, but changed in v2. >>>>> Petri Savolainen(psavol) wrote: >>>>> Tail and head indexes are (masked from) uint32_t and do not wrap around >>>>> when the ring is full. I think you assume that the store index is >>>>> 0...size-1, while it's full uint32_t which is then masked to get the >>>>> actual index. >>>>> >>>>> For example: >>>>> size = 100; >>>>> >>>>> Empty: >>>>> head = 100 >>>>> tail = 100 >>>>> num = 100 - 100 = 0 >>>>> >>>>> Full: >>>>> head = 100 >>>>> tail = 200 >>>>> num = 200 - 100 = 100 >>>>> >>>>> Wrap uint32_t + full: >>>>> head = 0xFFFFFF9C >>>>> tail = 0 >>>>> num = 0 - 0xFFFFFF9C = 0x64 = 100 >>>>> >>>>> So, no abs() needed. Ring size can be 4096, instead of 4095. >>>>>> Petri Savolainen(psavol) wrote: >>>>>> It's already documented 5 lines above: >>>>>> >>>>>> /* Initialize ring. Ring size must be a power of two. */ >>>>>> static inline void ring_st_init(ring_st_t *ring, uint32_t *data, >>>>>> uint32_t size) >>>>>> { >>>>>>> Petri Savolainen(psavol) wrote: >>>>>>> This function converts 32 bit buffer indexes to buffer header pointers. >>>>>>> The counter operation is buffer_index_from_buf(). The prefetch is a >>>>>>> side effect of the function, which may be changed/moved any time if >>>>>>> it's found out that there's a place for prefetching. I actually plan to >>>>>>> test if number of prefetches should be limited as e.g. 32 consecutive >>>>>>> prefetches may be too much for some CPU architectures. >>>>>>>> Petri Savolainen(psavol) wrote: >>>>>>>> I prefer style where '== 0' is used instead of '!'. Especially, when >>>>>>>> the if clause is as complex as this and there's danger for reader to >>>>>>>> miss the '!' sign. >>>>>>>>> Petri Savolainen(psavol) wrote: >>>>>>>>> It's there to ensure that all bits are zero also when someone would >>>>>>>>> modify the bitfield from two to three fields later on. Similarly to >>>>>>>>> memset() zero is used for struct inits. >>>>>>>>>> Petri Savolainen(psavol) wrote: >>>>>>>>>> There's no need for abs(). Since it's all uint32_t variables, wrap a >>>>>>>>>> round is handled already. >>>>>>>>>> An example in 8bits: >>>>>>>>>> 0xff - 0xfd = 0x02 >>>>>>>>>> 0x00 - 0xfe = 0x02 >>>>>>>>>> 0x01 - 0xff = 0x02 >>>>>>>>>> 0x02 - 0x00 = 0x02 >>>>>>>>>> >>>>>>>>>> This passes both gcc and clang, and is used already in the other >>>>>>>>>> ring implementation see ring_deq_multi(). >>>>>>>>>>> Petri Savolainen(psavol) wrote: >>>>>>>>>>> I prefer style with blank line in the end of a typedef, since it's >>>>>>>>>>> easier to spot the type name (as it's not mixed into struct field >>>>>>>>>>> names). Checkpatch passes so this style should be OK. >>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>> Does this mean that sizes larger than 32 have no added performance >>>>>>>>>>>> benefit? >>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>> Must use `CONFIG_QUEUE_SIZE - 1` here, as noted earlier, if we're >>>>>>>>>>>>> not going to use the user-supplied queue size. >>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>>> Given its name, this looks like an extraneous statement that >>>>>>>>>>>>>> should be deleted. Renaming this to something like >>>>>>>>>>>>>> `prefetch_dequeued_bufs()` would make the intent clearer here. >>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>>>> `if (!ring_st_is_empty(&queue->s.ring_st))` seems more natural >>>>>>>>>>>>>>> here. >>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>>>>> Change to `if (param->size >= CONFIG_QUEUE_SIZE)` to handle >>>>>>>>>>>>>>>> the effective queue capacity. The user-supplied `size` should >>>>>>>>>>>>>>>> then be set to `ROUNDUP_POWER2_U32(size) - 1` for the masking >>>>>>>>>>>>>>>> to work properly. >>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>>>>>> Same comment here as for plain queues. >>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>>>>>>> As noted earlier, due to "losing" one entry to distinguish >>>>>>>>>>>>>>>>>> queue empty/full, this should be returned as >>>>>>>>>>>>>>>>>> `CONFIG_QUEUE_SIZE - 1`, and we also need to ensure that >>>>>>>>>>>>>>>>>> `CONFIG_QUEUE_SIZE` is itself a power of 2. >>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>>>>>>>> Since you're initializing `index.pool` and `index.buffer` >>>>>>>>>>>>>>>>>>> there's no need to set `index.u32` here. >>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>>>>>>>>> We originally had this index partitioning based on >>>>>>>>>>>>>>>>>>>> `ODP_CONFIG_POOLS`. Do we want to return to that here? >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> If not, we at least need an `ODP_STATIC_ASSERT()` to >>>>>>>>>>>>>>>>>>>> ensure that `ODP_CONFIG_POOLS < 256` or else bad things >>>>>>>>>>>>>>>>>>>> will happen here. >>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>>>>>>>>>> This routine can be optimized to: >>>>>>>>>>>>>>>>>>>>> ``` >>>>>>>>>>>>>>>>>>>>> return ring->head == ring->tail; >>>>>>>>>>>>>>>>>>>>> ``` >>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>>>>>>>>>>> Your invariant is the queue is empty when `head == tail` >>>>>>>>>>>>>>>>>>>>>> therefore the queue is full when `abs(tail - head) == >>>>>>>>>>>>>>>>>>>>>> mask`, so the correct calculation here is: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> `num = mask - abs(tail - head);` >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> The effect is that a queue can only hold `size - 1` >>>>>>>>>>>>>>>>>>>>>> elements, otherwise you cannot distinguish between a >>>>>>>>>>>>>>>>>>>>>> full and an empty queue without another bit of metadata, >>>>>>>>>>>>>>>>>>>>>> which is a cost you're trying to avoid. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> This is somewhat problematic if the caller is trying to >>>>>>>>>>>>>>>>>>>>>> be "optimal" by specifying a power of two in the `size` >>>>>>>>>>>>>>>>>>>>>> parameter of the `odp_queue_param_t` passed to >>>>>>>>>>>>>>>>>>>>>> `odp_queue_create()`. For this reason we may wish to >>>>>>>>>>>>>>>>>>>>>> return a `max_size` of a power of 2 - 1 in >>>>>>>>>>>>>>>>>>>>>> `odp_queue_capability()` as part of this patch series. >>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>>>>>>>>>>>> This only works if `size` is a power of 2. Should be >>>>>>>>>>>>>>>>>>>>>>> documented as such, since this is an internal routine. >>>>>>>>>>>>>>>>>>>>>>> In this case an `ODP_ASSERT(size == >>>>>>>>>>>>>>>>>>>>>>> ROUNDUP_POWER2_U32(size))` for this requirement would >>>>>>>>>>>>>>>>>>>>>>> be a useful debugging aid. >>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>>>>>>>>>>>>> should be `num = abs(tail - head);` to deal with wrap >>>>>>>>>>>>>>>>>>>>>>>> arounds, otherwise may be misinterpreted as overly >>>>>>>>>>>>>>>>>>>>>>>> large since it's `uint32_t`. Note that GCC and clang >>>>>>>>>>>>>>>>>>>>>>>> recognize `abs()` and treat it as a builtin, so >>>>>>>>>>>>>>>>>>>>>>>> there's no actual `stdlib.h` call here. >>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>>>>>>>>>>>>>> Extra blank line should be removed (nit). https://github.com/Linaro/odp/pull/492#discussion_r169891968 updated_at 2018-02-22 09:32:40