On Sun, Nov 08, 2020 at 02:19:55PM +0000, Ananyev, Konstantin wrote:
> 
> 
> > >>
> > >>>>>>>>>>>>>>>>>> Hi Olivier,
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> m->nb_seg must be reset on mbuf free
> > >>>> whatever
> > >>>>>> the
> > >>>>>>>> value
> > >>>>>>>>>> of m->next,
> > >>>>>>>>>>>>>>>>>>> because it can happen that m->nb_seg is
> > >> !=
> > >>>> 1.
> > >>>>>> For
> > >>>>>>>>>> instance in this
> > >>>>>>>>>>>>>>>>>>> case:
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>    m1 = rte_pktmbuf_alloc(mp);
> > >>>>>>>>>>>>>>>>>>>    rte_pktmbuf_append(m1, 500);
> > >>>>>>>>>>>>>>>>>>>    m2 = rte_pktmbuf_alloc(mp);
> > >>>>>>>>>>>>>>>>>>>    rte_pktmbuf_append(m2, 500);
> > >>>>>>>>>>>>>>>>>>>    rte_pktmbuf_chain(m1, m2);
> > >>>>>>>>>>>>>>>>>>>    m0 = rte_pktmbuf_alloc(mp);
> > >>>>>>>>>>>>>>>>>>>    rte_pktmbuf_append(m0, 500);
> > >>>>>>>>>>>>>>>>>>>    rte_pktmbuf_chain(m0, m1);
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> As rte_pktmbuf_chain() does not reset
> > >>>> nb_seg in
> > >>>>>> the
> > >>>>>>>>>> initial m1
> > >>>>>>>>>>>>>>>>>>> segment (this is not required), after
> > >> this
> > >>>> code
> > >>>>>> the
> > >>>>>>>>>> mbuf chain
> > >>>>>>>>>>>>>>>>>>> have 3 segments:
> > >>>>>>>>>>>>>>>>>>>    - m0: next=m1, nb_seg=3
> > >>>>>>>>>>>>>>>>>>>    - m1: next=m2, nb_seg=2
> > >>>>>>>>>>>>>>>>>>>    - m2: next=NULL, nb_seg=1
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> Freeing this mbuf chain will not
> > >> restore
> > >>>>>> nb_seg=1
> > >>>>>>>> in
> > >>>>>>>>>> the second
> > >>>>>>>>>>>>>>>>>>> segment.
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> Hmm, not sure why is that?
> > >>>>>>>>>>>>>>>>>> You are talking about freeing m1, right?
> > >>>>>>>>>>>>>>>>>> rte_pktmbuf_prefree_seg(struct rte_mbuf
> > >> *m)
> > >>>>>>>>>>>>>>>>>> {
> > >>>>>>>>>>>>>>>>>>  ...
> > >>>>>>>>>>>>>>>>>>  if (m->next != NULL) {
> > >>>>>>>>>>>>>>>>>>                          m->next = NULL;
> > >>>>>>>>>>>>>>>>>>                          m->nb_segs = 1;
> > >>>>>>>>>>>>>>>>>>                  }
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> m1->next != NULL, so it will enter the
> > >> if()
> > >>>>>> block,
> > >>>>>>>>>>>>>>>>>> and will reset both next and nb_segs.
> > >>>>>>>>>>>>>>>>>> What I am missing here?
> > >>>>>>>>>>>>>>>>>> Thinking in more generic way, that
> > >> change:
> > >>>>>>>>>>>>>>>>>>   -              if (m->next != NULL) {
> > >>>>>>>>>>>>>>>>>>   -                      m->next = NULL;
> > >>>>>>>>>>>>>>>>>>   -                      m->nb_segs = 1;
> > >>>>>>>>>>>>>>>>>>   -              }
> > >>>>>>>>>>>>>>>>>>   +              m->next = NULL;
> > >>>>>>>>>>>>>>>>>>   +              m->nb_segs = 1;
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> Ah, sorry. I oversimplified the example
> > >> and
> > >>>> now
> > >>>>>> it
> > >>>>>>>> does
> > >>>>>>>>>> not
> > >>>>>>>>>>>>>>>>> show the issue...
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> The full example also adds a split() to
> > >> break
> > >>>> the
> > >>>>>>>> mbuf
> > >>>>>>>>>> chain
> > >>>>>>>>>>>>>>>>> between m1 and m2. The kind of thing that
> > >>>> would
> > >>>>>> be
> > >>>>>>>> done
> > >>>>>>>>>> for
> > >>>>>>>>>>>>>>>>> software TCP segmentation.
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> If so, may be the right solution is to care
> > >>>> about
> > >>>>>>>> nb_segs
> > >>>>>>>>>>>>>>>> when next is set to NULL on split? Any
> > >> place
> > >>>> when
> > >>>>>> next
> > >>>>>>>> is
> > >>>>>>>>>> set
> > >>>>>>>>>>>>>>>> to NULL. Just to keep the optimization in a
> > >>>> more
> > >>>>>>>> generic
> > >>>>>>>>>> place.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> The problem with that approach is that there
> > >> are
> > >>>>>> already
> > >>>>>>>>>> several
> > >>>>>>>>>>>>>>> existing split() or trim() implementations in
> > >>>>>> different
> > >>>>>>>> DPDK-
> > >>>>>>>>>> based
> > >>>>>>>>>>>>>>> applications. For instance, we have some in
> > >>>>>> 6WINDGate. If
> > >>>>>>>> we
> > >>>>>>>>>> force
> > >>>>>>>>>>>>>>> applications to set nb_seg to 1 when
> > >> resetting
> > >>>> next,
> > >>>>>> it
> > >>>>>>>> has
> > >>>>>>>>>> to be
> > >>>>>>>>>>>>>>> documented because it is not straightforward.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> I think it is better to go that way.
> > >>>>>>>>>>>>>>  From my perspective it seems natural to reset
> > >>>> nb_seg at
> > >>>>>>>> same
> > >>>>>>>>>> time
> > >>>>>>>>>>>>>> we reset next, otherwise inconsistency will
> > >> occur.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> While it is not explicitly stated for nb_segs, to
> > >> me
> > >>>> it
> > >>>>>> was
> > >>>>>>>> clear
> > >>>>>>>>>> that
> > >>>>>>>>>>>>> nb_segs is only valid in the first segment, like
> > >> for
> > >>>> many
> > >>>>>>>> fields
> > >>>>>>>>>> (port,
> > >>>>>>>>>>>>> ol_flags, vlan, rss, ...).
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> If we say that nb_segs has to be valid in any
> > >>>> segments,
> > >>>>>> it
> > >>>>>>>> means
> > >>>>>>>>>> that
> > >>>>>>>>>>>>> chain() or split() will have to update it in all
> > >>>>>> segments,
> > >>>>>>>> which
> > >>>>>>>>>> is not
> > >>>>>>>>>>>>> efficient.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Why in all?
> > >>>>>>>>>>>> We can state that nb_segs on non-first segment
> > >> should
> > >>>>>> always
> > >>>>>>>> equal
> > >>>>>>>>>> 1.
> > >>>>>>>>>>>> As I understand in that case, both split() and
> > >> chain()
> > >>>> have
> > >>>>>> to
> > >>>>>>>>>> update nb_segs
> > >>>>>>>>>>>> only for head mbufs, rest ones will remain
> > >> untouched.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Well, anyway, I think it's strange to have a
> > >> constraint
> > >>>> on m-
> > >>>>>>>>> nb_segs
> > >>>>>>>>>> for
> > >>>>>>>>>>> non-first segment. We don't have that kind of
> > >> constraints
> > >>>> for
> > >>>>>>>> other
> > >>>>>>>>>> fields.
> > >>>>>>>>>>
> > >>>>>>>>>> True, we don't. But this is one of the fields we
> > >> consider
> > >>>>>> critical
> > >>>>>>>>>> for proper work of mbuf alloc/free mechanism.
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> I am not sure that requiring m->nb_segs == 1 on non-first
> > >>>>>> segments
> > >>>>>>>> will provide any benefits.
> > >>>>>>>>
> > >>>>>>>> It would make this patch unneeded.
> > >>>>>>>> So, for direct, non-segmented mbufs  pktmbuf_free() will
> > >> remain
> > >>>>>> write-
> > >>>>>>>> free.
> > >>>>>>>
> > >>>>>>> I see. Then I agree with Konstantin that alternative
> > >> solutions
> > >>>> should
> > >>>>>> be considered.
> > >>>>>>>
> > >>>>>>> The benefit regarding free()'ing non-segmented mbufs - which
> > >> is a
> > >>>>>> very common operation - certainly outweighs the cost of
> > >> requiring
> > >>>>>> split()/chain() operations to set the new head mbuf's nb_segs =
> > >> 1.
> > >>>>>>>
> > >>>>>>> Nonetheless, the bug needs to be fixed somehow.
> > >>>>>>>
> > >>>>>>> If we can't come up with a better solution that doesn't break
> > >> the
> > >>>>>> ABI, we are forced to accept the patch.
> > >>>>>>>
> > >>>>>>> Unless the techboard accepts to break the ABI in order to
> > >> avoid
> > >>>> the
> > >>>>>> performance cost of this patch.
> > >>>>>>
> > >>>>>> Did someone notice a performance drop with this patch?
> > >>>>>> On my side, I don't see any regression on a L3 use case.
> > >>>>>
> > >>>>> I am afraid that the DPDK performance regression tests are based
> > >> on
> > >>>> TX immediately following RX, so cache misses in TX may go by
> > >> unnoticed
> > >>>> because RX warmed up the cache for TX already. And similarly for RX
> > >>>> reusing mbufs that have been warmed up by the preceding free() at
> > >> TX.
> > >>>>>
> > >>>>> Please consider testing the performance difference with the mbuf
> > >>>> being completely cold at TX, and going completely cold again before
> > >>>> being reused for RX.
> > >>>>>
> > >>>>>>
> > >>>>>> Let's sumarize: splitting a mbuf chain and freeing it causes
> > >>>> subsequent
> > >>>>>> mbuf
> > >>>>>> allocation to return a mbuf which is not correctly initialized.
> > >>>> There
> > >>>>>> are 2
> > >>>>>> options to fix it:
> > >>>>>>
> > >>>>>> 1/ change the mbuf free function (this patch)
> > >>>>>>
> > >>>>>>     - m->nb_segs would behave like many other field: valid in
> > >> the
> > >>>> first
> > >>>>>>       segment, ignored in other segments
> > >>>>>>     - may impact performance (suspected)
> > >>>>>>
> > >>>>>> 2/ change all places where a mbuf chain is split, or trimmed
> > >>>>>>
> > >>>>>>     - m->nb_segs would have a specific behavior: count the
> > >> number of
> > >>>>>>       segments in the first mbuf, should be 1 in the last
> > >> segment,
> > >>>>>>       ignored in other ones.
> > >>>>>>     - no code change in mbuf library, so no performance impact
> > >>>>>>     - need to patch all places where we do a mbuf split or trim.
> > >>>> From
> > >>>>>> afar,
> > >>>>>>       I see at least mbuf_cut_seg_ofs() in DPDK. Some external
> > >>>>>> applications
> > >>>>>>       may have to be patched (for instance, I already found 3
> > >> places
> > >>>> in
> > >>>>>>       6WIND code base without a deep search).
> > >>>>>>
> > >>>>>> In my opinion, 1/ is better, except we notice a significant
> > >>>>>> performance,
> > >>>>>> because the (implicit) behavior is unchanged.
> > >>>>>>
> > >>>>>> Whatever the solution, some documentation has to be added.
> > >>>>>>
> > >>>>>> Olivier
> > >>>>>>
> > >>>>>
> > >>>>> Unfortunately, I don't think that anything but the first option
> > >> will
> > >>>> go into 20.11 and stable releases of older versions, so I stand by
> > >> my
> > >>>> acknowledgment of the patch.
> > >>>>
> > >>>> If we are affraid about 20.11 performance (it is legitimate, few
> > >> days
> > >>>> before the release), we can target 21.02. After all, everybody
> > >> lives
> > >>>> with this bug since 2017, so there is no urgency. If accepted and
> > >> well
> > >>>> tested, it can be backported in stable branches.
> > >>>
> > >>> +1
> > >>>
> > >>> Good thinking, Olivier!
> > >>
> > >> Looking at the changes once again, it probably can be reworked a bit:
> > >>
> > >> -        if (m->next != NULL) {
> > >> -                m->next = NULL;
> > >> -                m->nb_segs = 1;
> > >> -        }
> > >>
> > >> +        if (m->next != NULL)
> > >> +                m->next = NULL;
> > >> +        if (m->nb_segs != 1)
> > >> +                m->nb_segs = 1;
> > >>
> > >> That way we add one more condition checking, but I suppose it
> > >> shouldn't be that perf critical.
> > >> That way for direct,non-segmented mbuf it still should be write-free.
> > >> Except cases as you described above: chain(), then split().
> > >>
> > >> Of-course we still need to do perf testing for that approach too.
> > >> So if your preference it to postpone it till 21.02 - that's ok for me.
> > >> Konstantin
> > >
> > > With this suggestion, I cannot imagine any performance drop for direct, 
> > > non-segmented mbufs: It now reads m->nb_segs, residing in the
> > mbuf's first cache line, but the function already reads m->refcnt in the 
> > first cache line; so no cache misses are introduced.
> > 
> > +1
> 
> I don't expect perf drop with that approach either.
> But some perf testing still needs to be done, just in case 😊 

I also agree with your suggestion Konstantin.
Let's postpone it right after 20.11 so we have more time to test.
I'll send a v2.

Olivier

Reply via email to