As we agreed this morning, we should continue the DDF work without NUMA to have a running solution for Connect. We will then work on the NUMA support.
FF On 2 January 2017 at 18:02, Christophe Milard <christophe.mil...@linaro.org> wrote: > To start with: happy new year :-) > > Are you saying, Francois, that you want this to have support for Numa > directely? > In my opinion, having this as it is now enables building lists of > things, which, in turn, enable proceeding on the driver work > (enumerators...). do you think we should delay this to support NUMA > first? > I am not convinced it is the best to do: sometimes it is shorter (in > time) to redo things than not to progress: > Having said that, you are right that redefining the interfaces > (partly, at least for NUMA) costs a bit.... > Are you joining the Arch call tomorrow, Francois? Maybe a topic to take > then... > > Christophe. > > On 2 January 2017 at 17:48, Francois Ozog <francois.o...@linaro.org> wrote: >> Sorry for joining the discussion late... >> >> here are a few points: >> >> The more we defer taking into account NUMA node in memory related >> APIs, the more we'll suffer when we'll be forced to integrate it. I >> would strongly suggest we address this topic now. >> >> Petri mentioned that we could do allocations on HW managed buffers, so >> the memory backend of the pool may have to be selected: either >> shm_reserve or something else. This in turn means that there may be a >> need for shm_ops operations and special considerations to free >> allocated "units". DPDK went through a significant rework of pool to >> accommodate external pools. >> >> Performance is a key factor for implementation. If I recall properly >> comments about NGINX bad performance, memory allocations accounted for >> 30% of time to handle a packet. >> >> FF >> >> On 27 December 2016 at 11:51, Savolainen, Petri (Nokia - FI/Espoo) >> <petri.savolai...@nokia-bell-labs.com> wrote: >>>> >> > >>>> >> > typedef struct { >>>> >> > // sum of all (simultaneous) allocs >>>> >> > uint64_t pool_size; >>>> >> > >>>> >> > // Minimum alloc size application will request from pool >>>> >> > uint32_t min_alloc; >>>> >> > >>>> >> > // Maximum alloc size application will request from pool >>>> >> > uint32_t max_alloc; >>>> >> > >>>> >> > } odp_shm_pool_param_t; >>>> >> >>>> >> That makes more sense. >>>> >> Not sure we really need a struct for 3 values, but it could well be so >>>> >> on the north interface if you want. >>>> > >>>> > With param struct it's easier to add new params in backwards compatible >>>> manner (if needed). Also, we use 'param' with all other create calls. >>>> > >>>> >>>> I think the min_alloc will mostly be either be 1 (pool for many >>>> different things unknown at pool cration time) or equal to max_alloc >>>> (Pool of one single thing). So things look a bit overkill to me. >>>> >>>> But I will do as you want. :-) >>> >>> >>> It depends how well application knows in advance about usage. For example, >>> if it uses a pool for building a data base of struct foo's (80 bytes) and >>> struct bar's (100 bytes), then it can expect best performance for min=80, >>> max=100, rather than min=1, max=100. >>> >>> >>>> >> OK. As buddy allocation garantees a minimum of 50% efficiency, we can >>>> >> do like that: >>>> >> if (min_alloc == max_alloc) >>>> >> => slab (size = pool size rounded up to ineteger nb of min_alloc) >>>> >> else >>>> >> => buddy (next_power_of_2(2*given_pool_sz)) >>>> >> >>>> >> OK? >>>> > >>>> > Seems OK. Also, the algorithm is easy to tune afterward. E.g. you could >>>> use slab for bit wider range - it should be faster of the two, since it's >>>> simpler. >>>> > >>>> > if (min_alloc == max_alloc || min_alloc > 0.8*max_alloc) >>>> >>>> I guess the tipping point is 2 instead of 0.8: if max_alloc > >>>> 2*min_alloc then a buddy allocator will perform better than a slab. If >>>> not, a slab is better, because the buddy allocator will have to cope >>>> for the worse case. >>> >>> >>> The selection will be based on CPU cycle vs. memory consumption figures. Up >>> to some point, the user is willing to sacrifice higher memory usage for >>> lower CPU cycle consumption. The point will be based on cpu/memory >>> performance difference of the two implementations, system memory size, >>> cache size, etc. It may be min = 0.8*max, min = 0.5*max, or something else. >>> >>> -Petri >>> >>> >> >> >> >> -- >> François-Frédéric Ozog | Director Linaro Networking Group >> T: +33.67221.6485 >> francois.o...@linaro.org | Skype: ffozog -- François-Frédéric Ozog | Director Linaro Networking Group T: +33.67221.6485 francois.o...@linaro.org | Skype: ffozog