> > > > > > > > +#define RTE_QSBR_CNT_THR_OFFLINE 0 #define RTE_QSBR_CNT_INIT > 1 > > > > + > > > > +/** > > > > + * RTE thread Quiescent State structure. > > > > + * Quiescent state counter array (array of 'struct > > > > +rte_rcu_qsbr_cnt'), > > > > + * whose size is dependent on the maximum number of reader > > > > +threads > > > > + * (m_threads) using this variable is stored immediately > > > > +following > > > > + * this structure. > > > > + */ > > > > +struct rte_rcu_qsbr { > > > > + uint64_t token __rte_cache_aligned; > > > > + /**< Counter to allow for multiple simultaneous QS queries */ > > > > + > > > > + uint32_t num_elems __rte_cache_aligned; > > > > + /**< Number of elements in the thread ID array */ > > > > + uint32_t m_threads; > > > > + /**< Maximum number of threads this RCU variable will use */ > > > > + > > > > + uint64_t reg_thread_id[RTE_QSBR_THRID_ARRAY_ELEMS] > > > __rte_cache_aligned; > > > > + /**< Registered thread IDs are stored in a bitmap array */ > > > > > > > > > As I understand you ended up with fixed size array to avoid 2 > > > variable size arrays in this struct? > > Yes > > > > > Is that big penalty for register/unregister() to either store a > > > pointer to bitmap, or calculate it based on num_elems value? > > In the last RFC I sent out [1], I tested the impact of having > > non-fixed size array. There 'was' a performance degradation in most of the > performance tests. The issue was with calculating the address of per thread > QSBR counters (not with the address calculation of the bitmap). > > With the current patch, I do not see the performance difference (the > > difference between the RFC and this patch are the memory orderings, > > they are masking any perf gain from having a fixed array). However, I have > kept the fixed size array as the generated code does not have additional > calculations to get the address of qsbr counter array elements. > > > > [1] http://mails.dpdk.org/archives/dev/2019-February/125029.html > > Ok I see, but can we then arrange them ina different way: > qsbr_cnt[] will start at the end of struct rte_rcu_qsbr (same as you have it > right now). > While bitmap will be placed after qsbr_cnt[]. Yes, that is an option. Though, it would mean we have to calculate the address, similar to macro 'RTE_QSBR_CNT_ARRAY_ELM'
> As I understand register/unregister is not consider on critical path, so some > perf-degradation here doesn't matter. Yes > Also check() would need extra address calculation for bitmap, but considering > that we have to go through all bitmap (and in worst case qsbr_cnt[]) > anyway, that probably not a big deal? I think the address calculation can be made simpler than what I had tried before. I can give it a shot. > > > > > > As another thought - do we really need bitmap at all? > > The bit map is helping avoid accessing all the elements in > > rte_rcu_qsbr_cnt array (as you have mentioned below). This provides > > the ability to scale the number of threads dynamically. For ex: an > application can create a qsbr variable with 48 max threads, but currently only > 2 threads are active (due to traffic conditions). > > I understand that bitmap supposed to speedup check() for situations when > most threads are unregistered. > My thought was that might be check() speedup for such situation is not that > critical. IMO, there is a need to address both the cases, considering the future direction of DPDK. It is possible to introduce a counter for the current number of threads registered. If that is same as maximum number of threads, then scanning the registered thread ID array can be skipped. > > > > > > Might it is possible to sotre register value for each thread inside > > > it's > > > rte_rcu_qsbr_cnt: > > > struct rte_rcu_qsbr_cnt {uint64_t cnt; uint32_t register;} > > > __rte_cache_aligned; ? > > > That would cause check() to walk through all elems in > > > rte_rcu_qsbr_cnt array, but from other side would help to avoid cache > conflicts for register/unregister. > > With the addition of rte_rcu_qsbr_thread_online/offline APIs, the > > register/unregister APIs are not in critical path anymore. Hence, the cache > conflicts are fine. The online/offline APIs work on thread specific cache > lines > and these are in the critical path. > > > > > > > > > +} __rte_cache_aligned; > > > > +