This post is a follow-up to the side chat that went on during today's ARCH
call. It seems to me there's been a lot of confusion of late on this
subject, I suspect mainly due to people not writing things down and just
engaging in open-ended discussion.

ODP provides an API framework. The framework is specified such that any
application written to that framework is guaranteed to be source-code
portable among all ODP implementations of that framework. That was part of
the original charter of ODP and it is one we've demonstrated repeatedly
since the beginning by running the same demo apps across many different ODP
implementations. We shouldn't allow rumor to somehow suggest that we
haven't done this.

What ODP is not is a complete programming environment. So it's important to
appreciate what ODP does and does not do. Because ODP is not an OS or a
substitute for an OS, it defers a number of things such as the precise
semantics of threads to the implementation.  However this does not in any
way affect ODP portability because ODP is very precise about what the
semantics of ODP APIs are.

ODP APIs operate on handles, not pointers. Handles are abstract data types
that define ODP functionality.  As such ODP applications are written to use
handles, not pointers. Moreover, the basic ODP architecture is based on
message passing, not shared memory. It is thus independent of various
memory models that may be supported by platforms that implement ODP.

Please see the detailed discussion of point S19 in Christophe's main doc
<https://collaborate.linaro.org/display/ODP/odp_thread+and+shmem+debate#odp_threadandshmemdebate-S19>,
which seems to be the core issue here. I haven't seen any real disagreement
with the taxonomy I outlined there.

For the specific use case of parallel processing of a single packet, as
I've stated on mailing list on a number of occasions this is trivially
handled within the current ODP APIs because the odp_packet_offset() API
allows addressability to portions of a packet.  So ODP should have no
problem whatsoever integrating with OpenMP
<https://computing.llnl.gov/tutorials/openMP/>, for example:

void parallel_process_pkt(odp_packet_t pkt)
{
        int numthreads, tid;
        uint32_t offset, myend;
        uint32_t pktlen = odp_packet_len(pkt);
        void *addr;
        uint32_t seglen;

       /* Fork a team of threads that work on individual segments of this
packet */
       #pragma omp parallel private(tid, offset, myend, addr, seglen)
        {
            tid = omp_get_thread_num();
            numthreads = omp_get_num_threads();
            offset   = tid * (pktlen / numthreads);
            myend = min(offset + pktlen / numthreads, pktlen);

            while (offset < myend) {
                    addr = odp_packet_offset(pkt, offset, &seglen, NULL)
                    if (addr != NULL) {
                       ...process this segment of the packet for up to
min(seglen, myend - offset) bytes
                    }
                    offset += seglen;
            }
        }
}

I know of no other way that you'd expect a single packet to be processed in
parallel, and this
construct should be portable across any ODP implementation regardless of
its memory model
precisely because all actual addresses used are thread-local, as intended.

Are there any other use cases that we need to concern ourselves with? If
yes, please post them here so that they can be discussed.

As noted, the only API that ODP offers that refers to any sort of shared
memory is the odp_shm_xxx() family of routines. odp_shm_reserve() takes a
flags parameter that is intended to specify the sharing scope of the
storage in this area, but (in Monarch) we haven't really said what these
mean with any real precision. All we need to in Tiger Moth is two things:

1. Specify a complete set of sharing flags and their associated semantics
that we wish to define.

2. Recognize that not every ODP implementation may be able to offer all of
these options either
at all, or at acceptable performance levels.

It thus becomes an application responsibility to specify the share scope it
needs for each
odp_shm_reserve() call it makes and process any failure return codes
appropriately. This may mean that some ODP applications may not run on
certain ODP implementations, but that's perfectly OK since there are many
reasons why an ODP application may have a preferred set of
implementation requirements other than shm scope (e.g., number of threads,
queues, etc. supported).

Comments welcome and encouraged!

Reply via email to