2016-02-29 12:51, Panu Matilainen: > On 02/24/2016 03:23 PM, Ananyev, Konstantin wrote: > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Panu Matilainen > >> On 02/23/2016 07:35 AM, Xie, Huawei wrote: > >>> On 2/22/2016 10:52 PM, Xie, Huawei wrote: > >>>> On 2/4/2016 1:24 AM, Olivier MATZ wrote: > >>>>> On 01/27/2016 02:56 PM, Panu Matilainen wrote: > >>>>>> Since rte_pktmbuf_alloc_bulk() is an inline function, it is not part of > >>>>>> the library ABI and should not be listed in the version map. > >>>>>> > >>>>>> I assume its inline for performance reasons, but then you lose the > >>>>>> benefits of dynamic linking such as ability to fix bugs and/or improve > >>>>>> itby just updating the library. Since the point of having a bulk API is > >>>>>> to improve performance by reducing the number of calls required, does > >>>>>> it > >>>>>> really have to be inline? As in, have you actually measured the > >>>>>> difference between inline and non-inline and decided its worth all the > >>>>>> downsides? > >>>>> Agree with Panu. It would be interesting to compare the performance > >>>>> between inline and non inline to decide whether inlining it or not. > >>>> Will update after i gathered more data. inline could show obvious > >>>> performance difference in some cases. > >>> > >>> Panu and Oliver: > >>> I write a simple benchmark. This benchmark run 10M rounds, in each round > >>> 8 mbufs are allocated through bulk API, and then freed. > >>> These are the CPU cycles measured(Intel(R) Xeon(R) CPU E5-2680 0 @ > >>> 2.70GHz, CPU isolated, timer interrupt disabled, rcu offloaded). > >>> Btw, i have removed some exceptional data, the frequency of which is > >>> like 1/10. Sometimes observed user usage suddenly disappeared, no clue > >>> what happened. > >>> > >>> With 8 mbufs allocated, there is about 6% performance increase using > >>> inline. > >> [...] > >>> > >>> With 16 mbufs allocated, we could still observe obvious performance > >>> difference, though only 1%-2% > >>> > >> [...] > >>> > >>> With 32/64 mbufs allocated, the deviation of the data itself would hide > >>> the performance difference. > >>> So we prefer using inline for performance. > >> > >> At least I was more after real-world performance in a real-world > >> use-case rather than CPU cycles in a microbenchmark, we know function > >> calls have a cost but the benefits tend to outweight the cons. > >> > >> Inline functions have their place and they're far less evil in project > >> internal use, but in library public API they are BAD and should be ... > >> well, not banned because there are exceptions to every rule, but highly > >> discouraged. > > > > Why is that? > > For all the reasons static linking is bad, and what's worse it forces > the static linking badness into dynamically linked builds. > > If there's a bug (security or otherwise) in a library, a distro wants to > supply an updated package which fixes that bug and be done with it. But > if that bug is in an inlined code, supplying an update is not enough, > you also need to recompile everything using that code, and somehow > inform customers possibly using that code that they need to not only > update the library but to recompile their apps as well. That is > precisely the reason distros go to great lenghts to avoid *any* > statically linked apps and libs in the distro, completely regardless of > the performance overhead. > > In addition, inlined code complicates ABI compatibility issues because > some of the code is one the "wrong" side, and worse, it bypasses all the > other ABI compatibility safeguards like soname and symbol versioning. > > Like said, inlined code is fine for internal consumption, but incredibly > bad for public interfaces. And of course, the more complicated a > function is, greater the potential of needing bugfixes. > > Mind you, none of this is magically specific to this particular > function. Except in the sense that bulk operations offer a better way of > performance improvements than just inlining everything. > > > As you can see right now we have all mbuf alloc/free routines as static > > inline. > > And I think we would like to keep it like that. > > So why that particular function should be different? > > Because there's much less need to have it inlined since the function > call overhead is "amortized" by the fact its doing bulk operations. "We > always did it that way" is not a very good reason :) > > > After all that function is nothing more than a wrapper > > around rte_mempool_get_bulk() unrolled by 4 loop {rte_pktmbuf_reset()} > > So unless mempool get/put API would change, I can hardly see there could be > > any ABI > > breakages in future. > > About 'real world' performance gain - it was a 'real world' performance > > problem, > > that we tried to solve by introducing that function: > > http://dpdk.org/ml/archives/dev/2015-May/017633.html > > > > And according to the user feedback, it does help: > > http://dpdk.org/ml/archives/dev/2016-February/033203.html > > The question is not whether the function is useful, not at all. The > question is whether the real-world case sees any measurable difference > in performance if the function is made non-inline.
This is a valid question, and it applies to a large part of DPDK. But it's something to measure and change more globally than just a new function. Generally speaking, any effort to reduce the size of the exported headers will be welcome. That said, this patch won't be blocked.