Getting the ODP API files out of platform/include/api is the first step. But based on earlier attempts to deal with inlining Taras kept running into issues because GCC insists that functions to be inlined must have inline in their forward declarations. There's no simple way to do that without a lot of macro ugliness, but I suspect autotools could help with that.
Each ODP API specified in a common header may need to have a platform-specific inline specification attached to it, depending on what the implementation needs. If this were an all-or-nothing thing we could do that with a simple macro that gets set in odp_platform_types.h (or a similar file). Call it ODP_API and it's either set to null or to 'static inline'. Then the common ODP API files would say: ODP_API <rest of function prototype> for example, instead of today saying void *odp_buffer_addr(odp_buffer_t buf); the canonical form would be written as ODP_API void *odp_buffer_addr(odp_buffer_t buf); and we'd get the needed static inline (or not) depending on how the implementation is written. But it's not that simple because each API is independently inlineable, so a single macro won't do. We could have a separate macro for each API, but that gets unwieldy very quickly. Instead, the API file could contain some common symbols that get macro-expanded based on ./configure options so that what gets generated into platform/include/api is what's needed for that platform/arch/etc. This might also be a way to easily capture platform-specific doxygen since if the API files are being custom generated then they can have specific documentation attached to them as well. That might be a cleaner way of addressing Petri's earlier concern about odp_config.h containing linux-generic stuff mixed in with general ODP documentation. On Wed, Dec 24, 2014 at 9:28 AM, Mike Holmes <[email protected]> wrote: > > > On 24 December 2014 at 10:05, Bill Fischofer <[email protected]> > wrote: > >> I suspect the long-term answer will be to use the autotools to construct >> the actual include hierarchy (and possibly the files themelves) needed for >> a given --with-platform specification. >> > > Let me check I have the same terminology in my head - we should have this > defined in the arch doc but I did not look. > platform = a physical board with interfaces etc and a core of some > architecture as a complete solution > arch = a cpu core for example octeon, arm, x86 > > In this case I then see a need to pick up arch independently > from --with-platform so that the optimizations can be reused by all > platforms with that architecture core. If we dont I would see a need for > platform/linux-x86-generic > platform/linux-arm-generic > platform/linux-octeon-generic > > >> That would mean the canonical ODP API files may need to be meta-files >> that are used as input to ./configure to generate the actual ones used. >> > > If the work Taras and Anders are doing to clean out the API and move it > back to the root succeeds, and it looks good to me, then it opens the door > to some more wriggle room in solving this issue I think. > But I could see the configure step helping take a single API definition > and attaching it to the required optimizations as a solution if it cannot > be done within C / GCC > > > > >> >> On Wed, Dec 24, 2014 at 9:03 AM, Bill Fischofer < >> [email protected]> wrote: >> >>> That makes sense. The only problem is GCC's problems with forward >>> declarations and inlining. IMO a design issue with the compiler, but it is >>> what it is. But we do need a general solution to this problem in 2015 >>> since proper inlining support will be critical for production-grade >>> implementations. >>> >>> On Wed, Dec 24, 2014 at 8:59 AM, Mike Holmes <[email protected]> >>> wrote: >>> >>>> >>>> >>>> On 24 December 2014 at 06:19, Maxim Uvarov <[email protected]> >>>> wrote: >>>> >>>>> On 12/23/2014 10:40 PM, Mike Holmes wrote: >>>>> >>>>>> >>>>>> >>>>>> On 23 December 2014 at 13:49, Bill Fischofer < >>>>>> [email protected] <mailto:[email protected]>> wrote: >>>>>> >>>>>> That's good to know and certainly the code is cleaner that way. I >>>>>> think we should adopt the practice that any architecture-specific >>>>>> changes to linux-generic code (adds, updates, or removals) must >>>>>> either originate from or have a Reviewed-by a representative of >>>>>> that architecture (in this case, Bala). That way there's no >>>>>> confusion as to whether the known experts agree with the change, >>>>>> since these sort of things can have subtle implications that are >>>>>> not obvious to the layman. >>>>>> >>>>>> >>>>>> I think linux-generic should be devoid of any per core optimizations >>>>>> in its default state. >>>>>> There could be in an arch directory and it would be supported as you >>>>>> say Bill by someone who will care for that that variant, this also avoids >>>>>> filling linux-generic with a possibly large list of preprocessor flags >>>>>> that >>>>>> we can't test. >>>>>> >>>>>> In this case we don't have public access to the octeon to even know >>>>>> if this code works so separating or deletion feels right to me, although >>>>>> Cavium are adding their board in this case. >>>>>> >>>>>> To have optimizations we should understand which per arch >>>>> optimization we need. I see that if newer gcc supports good optimizations >>>>> then we need to use that gcc. Most of people use Open Embedded and Linaro >>>>> gcc which is almost latest version. So if there is no optimization in >>>>> standard environment (gcc, kernel, libc) then arch implementation should >>>>> go >>>>> to platform specific git variant. If it's covered by standard environment >>>>> then we simple updating minimal version requirement for required >>>>> component. >>>>> >>>>> >>>> Lets take an example ODP API file >>>> >>>> static inline void odp_sync_stores(void) >>>> { >>>> #if defined __x86_64__ || defined __i386__ >>>> >>>> >-------__asm__ __volatile__ ("sfence\n" : : : "memory"); >>>> >>>> #elif defined(__arm__) >>>> #if __ARM_ARCH == 6 >>>> >-------__asm__ __volatile__ ("mcr p15, 0, %0, c7, c10, 5" \ >>>> >------->------->-------: : "r" (0) : "memory"); >>>> #elif __ARM_ARCH >= 7 || defined __aarch64__ >>>> >>>> >-------__asm__ __volatile__ ("dmb st" : : : "memory"); >>>> #else >>>> >-------__asm__ __volatile__ ("" : : : "memory"); >>>> #endif >>>> >>>> #elif defined __OCTEON__ >>>> >>>> >-------__asm__ __volatile__ ("syncws\n" : : : "memory"); >>>> >>>> #else >>>> >-------__sync_synchronize(); >>>> #endif >>>> >>>> >>>> How do we accommodate this without resorting to architecture specific >>>> processor flags / assembly? >>>> >>>> I would prefer to see platform/linux-generic/include/api/odp_sync.h >>>> changed to >>>> >>>> platform/linux-generic/include/api/x86/odp_sync.h >>>> platform/linux-generic/include/api/arm/odp_sync.h >>>> platform/linux-generic/include/api/octeon/odp_sync.h >>>> >>>> This assumes that there is currently no pure c code that can be used >>>> that is acceptable on all three platforms. >>>> >>>> Or since linux-generic is not a performance platform but rather a >>>> reference whenever we need to make a trade off between simplicity and >>>> performance, we alter this to be a regular function and move to:- >>>> >>>> platform/linux-generic/include/api/odp_sync.h <- now a pure forward >>>> declaration of the API with doxygen documentation >>>> and implementations correctly broken down by arch >>>> >>>> arch/x86/odp_sync.h >>>> arch/arm/odp_sync.h >>>> arch/octeon/odp_sync.h >>>> >>>> >>>> >>>>> Maxim. >>>>> >>>>> >>>>> >>>>>> On Tue, Dec 23, 2014 at 12:38 PM, Ola Liljedahl >>>>>> <[email protected] <mailto:[email protected]>> >>>>>> wrote: >>>>>> >>>>>> I had some discussions with Bala and we think the GCC >>>>>> generates the >>>>>> appropriate code for OCTEON so there should be no need for any >>>>>> OCTEON-specific inline assembler in odp_atomic.h. I just >>>>>> didn't want >>>>>> to remove this piece of code when I redesigned the atomic >>>>>> support >>>>>> because I didn't understand why there is a SYNCW instruction >>>>>> *before* >>>>>> the load. Load-acquire type of loads may need sync or barrier >>>>>> instructions but they come after the load, not before. >>>>>> >>>>>> I think we should trust GCC to generate better code for OCTEON >>>>>> than us >>>>>> mere humans, at least until proven wrong. So please remove >>>>>> this >>>>>> OCTEON-specific snippet. >>>>>> >>>>>> -- Ola >>>>>> >>>>>> >>>>>> On 23 December 2014 at 18:01, Mike Holmes >>>>>> <[email protected] <mailto:[email protected]>> >>>>>> wrote: >>>>>> > >>>>>> > >>>>>> > On 23 December 2014 at 11:15, Taras Kondratiuk >>>>>> <[email protected] <mailto:taras.kondratiuk@ >>>>>> linaro.org>> >>>>>> >>>>>> > wrote: >>>>>> >> >>>>>> >> On 12/23/2014 05:28 PM, Mike Holmes wrote: >>>>>> >> > how will this work, Octeon does not care about >>>>>> __atomic_fetch_add(&v, >>>>>> >> > 1, __ATOMIC_RELAXED); and yet you will fail configure I >>>>>> assume >>>>>> >> > >>>>>> >> > static inline uint32_t >>>>>> odp_atomic_fetch_inc_u32(odp_atomic_u32_t *atom) >>>>>> >> > { >>>>>> >> > #if defined __OCTEON__ >>>>>> >> >>-------uint32_t ret; >>>>>> >> >>-------__asm__ __volatile__ ("syncws"); >>>>>> >> >>-------__asm__ __volatile__ ("lai %0,(%2)" : "=r" (ret), >>>>>> "+m" (atom) : >>>>>> >> >>------->------->------- "r" (atom)); >>>>>> >> >>-------return ret; >>>>>> >> > #else >>>>>> >> >>-------return __atomic_fetch_add(&atom->v, 1, >>>>>> __ATOMIC_RELAXED); >>>>>> >> > #endif >>>>>> >> > } >>>>>> >> >>>>>> >> Linux-generic shouldn't have arch specific parts, because >>>>>> it is >>>>>> >> *generic*. Instead it can fall back to strong __sync >>>>>> functions if >>>>>> >> __atomic are not available. >>>>>> >> >>>>>> > >>>>>> > Linux kernel is generic but has core specifics - is this not >>>>>> the same ? >>>>>> > >>>>>> > What about these cases ? >>>>>> > >>>>>> > platform/linux-generic/odp_system_info.c:static int >>>>>> cpuinfo_octeon(FILE >>>>>> > *file, odp_system_info_t *sysinfo) >>>>>> > platform/linux-generic/odp_system_info.c: .cpu_arch_str >>>>>> = >>>>>> "octeon", >>>>>> > platform/linux-generic/odp_system_info.c: >>>>>> .cpuinfo_parser = >>>>>> > cpuinfo_octeon >>>>>> > platform/linux-generic/include/api/odp_align.h:#if defined >>>>>> __x86_64__ || >>>>>> > defined __i386__ >>>>>> > platform/linux-generic/include/api/odp_sync.h:#if defined >>>>>> __x86_64__ || >>>>>> > defined __i386__ >>>>>> > platform/linux-generic/include/odp_spin_internal.h:#if >>>>>> defined __x86_64__ || >>>>>> > defined __i386__ >>>>>> > platform/linux-generic/odp_system_info.c:#if defined >>>>>> __x86_64__ || defined >>>>>> > __i386__ || defined __OCTEON__ || \ >>>>>> > platform/linux-generic/odp_system_info.c:#if defined >>>>>> __x86_64__ || defined >>>>>> > __i386__ >>>>>> > platform/linux-generic/odp_system_info.c:static int >>>>>> cpuinfo_x86(FILE *file, >>>>>> > odp_system_info_t *sysinfo) >>>>>> > platform/linux-generic/odp_system_info.c: #if defined >>>>>> __x86_64__ || >>>>>> > defined __i386__ >>>>>> > platform/linux-generic/odp_system_info.c: .cpu_arch_str >>>>>> = >>>>>> "x86", >>>>>> > platform/linux-generic/odp_system_info.c: >>>>>> .cpuinfo_parser = >>>>>> > cpuinfo_x86 >>>>>> > platform/linux-generic/odp_system_info.c:#if defined >>>>>> __x86_64__ || defined >>>>>> > __i386__ || defined __OCTEON__ || \ >>>>>> > platform/linux-generic/odp_time.c:#if defined __x86_64__ || >>>>>> defined __i386__ >>>>>> > platform/linux-generic/include/api/odp_align.h:#elif >>>>>> defined >>>>>> __arm__ || >>>>>> > defined __aarch64__ >>>>>> > platform/linux-generic/include/api/odp_sync.h:#elif >>>>>> defined(__arm__) >>>>>> > platform/linux-generic/include/odp_spin_internal.h:#elif >>>>>> defined __arm__ >>>>>> > platform/linux-generic/odp_system_info.c:#elif defined >>>>>> __arm__ || defined >>>>>> > __aarch64__ >>>>>> > platform/linux-generic/odp_system_info.c:static int >>>>>> cpuinfo_arm(FILE *file >>>>>> > ODP_UNUSED, >>>>>> > platform/linux-generic/odp_system_info.c: #elif defined >>>>>> __arm__ || >>>>>> > defined __aarch64__ >>>>>> > platform/linux-generic/odp_system_info.c: .cpu_arch_str >>>>>> = >>>>>> "arm", >>>>>> > platform/linux-generic/odp_system_info.c: >>>>>> .cpuinfo_parser = >>>>>> > cpuinfo_arm >>>>>> > >>>>>> > >>>>>> >> >>>>>> >> >>>>>> >> Taras Kondratiuk >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > -- >>>>>> > Mike Holmes >>>>>> > Linaro Sr Technical Manager >>>>>> > LNG - ODP >>>>>> >>>>>> _______________________________________________ >>>>>> lng-odp mailing list >>>>>> [email protected] <mailto:[email protected]> >>>>>> http://lists.linaro.org/mailman/listinfo/lng-odp >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> *Mike Holmes* >>>>>> Linaro Sr Technical Manager >>>>>> LNG - ODP >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> lng-odp mailing list >>>>>> [email protected] >>>>>> http://lists.linaro.org/mailman/listinfo/lng-odp >>>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> lng-odp mailing list >>>>> [email protected] >>>>> http://lists.linaro.org/mailman/listinfo/lng-odp >>>>> >>>> >>>> >>>> >>>> -- >>>> *Mike Holmes* >>>> Linaro Sr Technical Manager >>>> LNG - ODP >>>> >>>> _______________________________________________ >>>> lng-odp mailing list >>>> [email protected] >>>> http://lists.linaro.org/mailman/listinfo/lng-odp >>>> >>>> >>> >> > > > -- > *Mike Holmes* > Linaro Sr Technical Manager > LNG - ODP >
_______________________________________________ lng-odp mailing list [email protected] http://lists.linaro.org/mailman/listinfo/lng-odp
