Fair enough. If Nokia would like to write the multi-threaded stress-test
app to go with the benchmarking they are doing, I can restructure the
implementation into a series of stages that are more easily "digested".
What I don't want to have happen is this area get back-burnered with just a
general agreement that it will be worked on "sometime later". DPDK is
evolving quite rapidly these days and ODP cannot afford to be seen as
dragging our feet if we want to remain competitive.

References are something that a number of our members have been requesting
for some time and we've said they will be part of Tiger Moth, so I'd like
us to be responsive to those requests.

On Mon, Feb 20, 2017 at 11:02 AM, Francois Ozog <francois.o...@linaro.org>
wrote:

> Thanks Petri,
>
> You accurately summarized what we said in arch call today.
>
> Bill, we'll cover the topic again on Wednesday. I have seen how it
> happened behind the scenes for DPDK: 1 year architectural discussion of
> implications (kind of background mode, coffee time), then little by little
> patch additions bottom up (from pool changes to full API impact). I think
> we need to be extra-careful with this complex topic.
>
> Cordially,
>
> FF
>
>
>
> On 20 February 2017 at 16:01, Savolainen, Petri (Nokia - FI/Espoo) <
> petri.savolai...@nokia-bell-labs.com> wrote:
>
>> Hi,
>>
>> We are already in the phase where code in master should be maintained for
>> production quality. There is no hurry to merge in code that has
>> questionable quality. Zero copy packet references is not a trivial feature
>> to implement. We are in much better position to review, test and use the
>> reference code, if it's developed in phases. That's why I propose that we
>> do multiple smaller steps:
>>
>> 1) merge simple, copy based implementation first (in api-next and
>> master), which we can be sure that is does not break anything
>> 2) write multi-threaded (performance) test apps for refs
>> 3) cleanup, optimize normal packet code towards zero-copy multi-ref
>> support (minimize places where refs are visible)
>> 4) implement zero-copy multi-ref for most obvious use cases: maybe static
>> references first ...
>> 5) continue multi-ref implementation, or decide that some rare corner
>> cases can be left with copy based implementation
>>
>> This actually follows what has been done before. Add simple, copy based
>> implementation first and continue development/optimization from there. So,
>> that we can step back and compare easily with previous version, if e.g.
>> race conditions are found.
>>
>> -Petri
>>
>>
>> > -----Original Message-----
>> > From: lng-odp [mailto:lng-odp-boun...@lists.linaro.org] On Behalf Of
>> Bill
>> > Fischofer
>> > Sent: Saturday, February 18, 2017 6:28 PM
>> > To: Francois Ozog <francois.o...@linaro.org>
>> > Cc: lng-odp@lists.linaro.org
>> > Subject: Re: [lng-odp] [API-NEXT PATCHv7 2/5] linux-generic: packet:
>> > implement reference apis
>> >
>> > On Sat, Feb 18, 2017 at 9:57 AM, Francois Ozog <
>> francois.o...@linaro.org>
>> > wrote:
>> >
>> > > Well, problem is still there.
>> > > You are doing something on a packet that may not exist anymore.
>> > >
>> >
>> > Can you elaborate? The bug fix patch eliminates the race condition that
>> > Janne pointed out because no thread manipulates a packet after
>> > decrementing
>> > the ref_count other than to free it if that operation decremented the
>> > ref_count to 0.
>> >
>> >
>> > >
>> > > On 17 February 2017 at 22:08, Bill Fischofer <
>> bill.fischo...@linaro.org>
>> > > wrote:
>> > >
>> > >> I've posted patch http://patches.opendataplane.org/patch/8155/ to
>> > >> address this issue.  It goes on api-next on top of patches
>> > >> http://patches.opendataplane.org/patch/7879/ and
>> > >> http://patches.opendataplane.org/patch/8154/
>> > >>
>> > >> On Fri, Feb 17, 2017 at 2:39 PM, Bill Fischofer
>> > >> <bill.fischo...@linaro.org> wrote:
>> > >> > First off, thank you very much for this review.
>> > >> >
>> > >> > Please note that this code has been streamlined in patch
>> > >> > http://patches.opendataplane.org/patch/7879/ and has been further
>> > >> > refined with patch http://patches.opendataplane.org/patch/8145/
>> but
>> > >> > the exposure you identify still exists in that code.
>> > >> >
>> > >> > On Fri, Feb 17, 2017 at 11:31 AM, Peltonen, Janne (Nokia -
>> FI/Espoo)
>> > >> > <janne.pelto...@nokia.com> wrote:
>> > >> >> Hi,
>> > >> >>
>> > >> >> I took a look at the packet references and it seems to me that
>> > >> >> either the implementation is a bit racy or I confused myself
>> > >> >> when reading the code. Or maybe I got the intended concurrency
>> > >> >> semantics of the packet references wrong?
>> > >> >>
>> > >> >> My first issue is that packet_free() may access freed packet
>> > >> >> header or corrupt unshared_len.
>> > >> >>
>> > >> >> The packet free function looks like this:
>> > >> >>
>> > >> >> static inline void packet_free(odp_packet_hdr_t *pkt_hdr)
>> > >> >> {
>> > >> >>         odp_packet_hdr_t *ref_hdr;
>> > >> >>         uint32_t ref_count;
>> > >> >>
>> > >> >>         do {
>> > >> >>                 ref_hdr = pkt_hdr->ref_hdr;
>> > >> >>                 ref_count = packet_ref_count(pkt_hdr) - 1;
>> > >> >>                 free_bufs(pkt_hdr, 0, pkt_hdr->buf_hdr.segcount);
>> > >> >>
>> > >> >>                 if (ref_count == 1)
>> > >> >>                         pkt_hdr->unshared_len =
>> pkt_hdr->frame_len;
>> > >> >>
>> > >> >>                 pkt_hdr = ref_hdr;
>> > >> >>         } while (pkt_hdr);
>> > >> >> }
>> > >> >>
>> > >> >> The problem here is that decrementing the ref_count, checking
>> > >> >> its value and updating unshared_len is not single atomic
>> > >> >> operation. By the time packet_free() checks if ref_count == 1
>> > >> >> (i.e. if there is exactly one other reference left somewhere),
>> > >> >> the true ref_count may have already been changed by another
>> > >> >> thread doing packet_free() or packet_ref().
>> > >> >>
>> > >> >> For example, if two threads have a reference to the same packet
>> > >> >> then execution (or the relevant memory ops) may get "interleaved"
>> > >> >> as follows:
>> > >> >>
>> > >> >> T1: call packet_free()
>> > >> >> T1: ref_count = packet_ref_count(pkt_hdr) - 1;
>> > >> >> At this point ref_count variable is 1
>> > >> >> T1: call free_bufs()
>> > >> >> T1: call packet_ref_dec()
>> > >> >> Now the ref_count of the packet header is 1.
>> > >> >> T2: call and complete packet_free()
>> > >> >> Thread 2 sees refcount 1 in the packet and frees the buffers
>> > >> >> T1: pkt_hdr->unshared_len = pkt_hdr->frame_len;
>> > >> >> Thread 1 accesses freed buffer for reading and writing.
>> > >> >
>> > >> > I agree. These steps should be reversed so that the code should
>> read:
>> > >> >
>> > >> > if (ref_count == 1)
>> > >> >    pkt_hdr->unshared_len = pkt_hdr->frame_len;
>> > >> >
>> > >> > free_bufs(pkt_hdr, 0, pkt_hdr->buf_hdr.segcount);
>> > >> >
>> > >> > Or using the code with the above two patches applied, the code
>> should
>> > >> read:
>> > >> >
>> > >> > static inline void packet_free(odp_packet_hdr_t *pkt_hdr)
>> > >> > {
>> > >> >         odp_packet_hdr_t *ref_hdr;
>> > >> >         uint32_t ref_count;
>> > >> >         int num_seg;
>> > >> >
>> > >> >         do {
>> > >> >                 ref_count = packet_ref_count(pkt_hdr);
>> > >> >                 num_seg = pkt_hdr->buf_hdr.segcount;
>> > >> >                 ref_hdr = pkt_hdr->ref_hdr;
>> > >> >
>> > >> >                 if (odp_likely((CONFIG_PACKET_MAX_SEGS == 1 ||
>> > num_seg
>> > >> == 1) &&
>> > >> >                     ref_count == 1)) {
>> > >> >                         buffer_free_multi((odp_buffer_t
>> > >> > *)&pkt_hdr->buf_hdr.handle.handle, 1);
>> > >> >                 } else {
>> > >> >                         if (ref_count == 2)
>> > >> >                                 pkt_hdr->unshared_len =
>> > >> pkt_hdr->frame_len;
>> > >> >
>> > >> >                         free_bufs(pkt_hdr, 0, num_seg);
>> > >> >                  }
>> > >> >
>> > >> >                  pkt_hdr = ref_hdr;
>> > >> >         } while (pkt_hdr);
>> > >> > }
>> > >> >
>> > >> > The mistake was trying to optimize things so that unshared_len is
>> not
>> > >> > set if the buffers are being freed, but that exposes these race
>> > >> > conditions. So the worst that should now happen is that it is set
>> > >> > unnecessarily before being freed.
>> > >> >
>> > >> > If you concur I'll fold this fix into a v3 for patch
>> > >> > http://patches.opendataplane.org/patch/8145/
>> > >> >
>> > >> >>
>> > >> >> Similarly, if T2 created a new reference, T1 would have
>> > >> >> a wrong idea of the number of remaining references and
>> > >> >> would adjust the unshared_len to an incorrect value.
>> > >> >>
>> > >> >> Right?
>> > >> >>
>> > >> >> Maybe other modifications of unshared_len are also racy.
>> > >> >
>> > >> > I don't believe so, because references do not change the existing
>> ODP
>> > >> > restriction that two threads cannot share the same odp_packet_t.
>> > When
>> > >> > a packet reference is created it returns a separate odp_packet_t
>> that
>> > >> > has its own metadata. So unshared_len is always private to an
>> > >> > individual odp_packet_t. The exception is static references but in
>> > >> > this case the entire
>> > >> > packet along with its metadata must be treated as read only so
>> > >> > operations like odp_packet_push_head() that would try to modify
>> > >> > unshared_len are prohibited.
>> > >> >
>> > >> >>
>> > >> >>
>> > >> >>
>> > >> >> The second issue is that the atomic ops for setting and
>> > >> >> reading the ref count seem to have too relaxed memory
>> > >> >> ordering. In particular, packet_ref_dec() must not happen
>> > >> >> (be visible to other threads) before its caller is done
>> > >> >> with the packet and the related memory accesses have
>> > >> >> completed. Now there does not seem to be any optimization
>> > >> >> and memory barrier to prevent the ref count decrementing
>> > >> >> happening too early. So I think it is at least theoretically
>> > >> >> possible that a thread e.g. reads from a packet buffer
>> > >> >> after it has already been freed by another thread, somehow
>> > >> >> like this:
>> > >> >>
>> > >> >> Source code order:
>> > >> >> T1: interesting_data = read_from_pkt(pkt)
>> > >> >> T1: packet_free(pkt)
>> > >> >>
>> > >> >> Order visible to T2:
>> > >> >> 1: ref count decr
>> > >> >> 2: read from pkt
>> > >> >>
>> > >> >> Now if T2 goes and frees the remaining reference between
>> > >> >> steps 1 and 2, T1 may get even more interesting data.
>> > >> >>
>> > >> >> Right?
>> > >> >
>> > >> > I don't believe so. The semantics of odp_atomic_fetch_dec_u32(),
>> > which
>> > >> > is what packet_ref_dec() uses, says that no two calls can see the
>> > same
>> > >> > fetched value, so only one thread will return ref_count == 1 and
>> free
>> > >> > the buffer. Note that if I see ref_count == 1 no other thread can
>> be
>> > >> > trying to increment it via a concurrent odp_packet_ref() call
>> because
>> > >> > that would mean that two threads were trying to manipulate the same
>> > >> > odp_packet_t, which is prohibited.
>> > >> >
>> > >> > For CPUs that support out of order instruction execution, this is
>> > only
>> > >> > permitted providing the reordering and speculative executions are
>> > >> > semantically consistent with sequential execution. If this were not
>> > >> > the case you'd constantly have to worry about a processor turning
>> > >> >
>> > >> > T1: interesting_data = read_from_pkt(pkt)
>> > >> > T1: packet_free(pkt)
>> > >> >
>> > >> > into
>> > >> >
>> > >> > T1: packet_free(pkt)
>> > >> > T1: interesting_data = read_from_pkt(pkt)
>> > >> >
>> > >> > In your scenario above: T2 cannot be issuing a read to pkt after
>> > >> > ref_count is decremented because the only way that T2 could be
>> > >> > decrementing ref_count would be if T2 issued an odp_packet_free()
>> > call
>> > >> > for it. Obviously if it tries to reference pkt after such a call
>> that
>> > >> > is an application error.
>> > >> >
>> > >> > Thanks again for your much-appreciated help in looking at this!
>> > >> >
>> > >> >>
>> > >> >>         Janne
>> > >> >>
>> > >> >>
>>
>
>
>
> --
> [image: Linaro] <http://www.linaro.org/>
> François-Frédéric Ozog | *Director Linaro Networking Group*
> T: +33.67221.6485
> francois.o...@linaro.org | Skype: ffozog
>
>

Reply via email to