This post is a follow-up to the side chat that went on during today's ARCH call. It seems to me there's been a lot of confusion of late on this subject, I suspect mainly due to people not writing things down and just engaging in open-ended discussion.
ODP provides an API framework. The framework is specified such that any application written to that framework is guaranteed to be source-code portable among all ODP implementations of that framework. That was part of the original charter of ODP and it is one we've demonstrated repeatedly since the beginning by running the same demo apps across many different ODP implementations. We shouldn't allow rumor to somehow suggest that we haven't done this. What ODP is not is a complete programming environment. So it's important to appreciate what ODP does and does not do. Because ODP is not an OS or a substitute for an OS, it defers a number of things such as the precise semantics of threads to the implementation. However this does not in any way affect ODP portability because ODP is very precise about what the semantics of ODP APIs are. ODP APIs operate on handles, not pointers. Handles are abstract data types that define ODP functionality. As such ODP applications are written to use handles, not pointers. Moreover, the basic ODP architecture is based on message passing, not shared memory. It is thus independent of various memory models that may be supported by platforms that implement ODP. Please see the detailed discussion of point S19 in Christophe's main doc <https://collaborate.linaro.org/display/ODP/odp_thread+and+shmem+debate#odp_threadandshmemdebate-S19>, which seems to be the core issue here. I haven't seen any real disagreement with the taxonomy I outlined there. For the specific use case of parallel processing of a single packet, as I've stated on mailing list on a number of occasions this is trivially handled within the current ODP APIs because the odp_packet_offset() API allows addressability to portions of a packet. So ODP should have no problem whatsoever integrating with OpenMP <https://computing.llnl.gov/tutorials/openMP/>, for example: void parallel_process_pkt(odp_packet_t pkt) { int numthreads, tid; uint32_t offset, myend; uint32_t pktlen = odp_packet_len(pkt); void *addr; uint32_t seglen; /* Fork a team of threads that work on individual segments of this packet */ #pragma omp parallel private(tid, offset, myend, addr, seglen) { tid = omp_get_thread_num(); numthreads = omp_get_num_threads(); offset = tid * (pktlen / numthreads); myend = min(offset + pktlen / numthreads, pktlen); while (offset < myend) { addr = odp_packet_offset(pkt, offset, &seglen, NULL) if (addr != NULL) { ...process this segment of the packet for up to min(seglen, myend - offset) bytes } offset += seglen; } } } I know of no other way that you'd expect a single packet to be processed in parallel, and this construct should be portable across any ODP implementation regardless of its memory model precisely because all actual addresses used are thread-local, as intended. Are there any other use cases that we need to concern ourselves with? If yes, please post them here so that they can be discussed. As noted, the only API that ODP offers that refers to any sort of shared memory is the odp_shm_xxx() family of routines. odp_shm_reserve() takes a flags parameter that is intended to specify the sharing scope of the storage in this area, but (in Monarch) we haven't really said what these mean with any real precision. All we need to in Tiger Moth is two things: 1. Specify a complete set of sharing flags and their associated semantics that we wish to define. 2. Recognize that not every ODP implementation may be able to offer all of these options either at all, or at acceptable performance levels. It thus becomes an application responsibility to specify the share scope it needs for each odp_shm_reserve() call it makes and process any failure return codes appropriately. This may mean that some ODP applications may not run on certain ODP implementations, but that's perfectly OK since there are many reasons why an ODP application may have a preferred set of implementation requirements other than shm scope (e.g., number of threads, queues, etc. supported). Comments welcome and encouraged!
