On 24 December 2014 at 06:19, Maxim Uvarov <[email protected]> wrote:

> On 12/23/2014 10:40 PM, Mike Holmes wrote:
>
>>
>>
>> On 23 December 2014 at 13:49, Bill Fischofer <[email protected]
>> <mailto:[email protected]>> wrote:
>>
>>     That's good to know and certainly the code is cleaner that way.  I
>>     think we should adopt the practice that any architecture-specific
>>     changes to linux-generic code (adds, updates, or removals) must
>>     either originate from or have a Reviewed-by a representative of
>>     that architecture (in this case, Bala).  That way there's no
>>     confusion as to whether the known experts agree with the change,
>>     since these sort of things can have subtle implications that are
>>     not obvious to the layman.
>>
>>
>> I think linux-generic should be devoid of any per core optimizations in
>> its default state.
>> There could be in an arch directory and it would be supported as you say
>> Bill by someone who will care for that that variant, this also avoids
>> filling linux-generic with a possibly large list of preprocessor flags that
>> we can't test.
>>
>> In this case we don't have public access to the octeon to even know if
>> this code works so separating or deletion feels right to me, although
>> Cavium are adding their board in this case.
>>
>>  To have optimizations we should understand which per arch optimization
> we need. I see that if newer gcc supports good optimizations then we need
> to use that gcc. Most of people use Open Embedded and Linaro gcc which is
> almost latest version. So if there is no optimization in standard
> environment (gcc, kernel, libc) then arch implementation should go to
> platform specific git variant. If it's covered by standard environment then
> we simple updating minimal version requirement for required component.
>
>
Lets take an example ODP API file

static inline void odp_sync_stores(void)
{
#if defined __x86_64__ || defined __i386__

>-------__asm__  __volatile__ ("sfence\n" : : : "memory");

#elif defined(__arm__)
#if __ARM_ARCH == 6
>-------__asm__ __volatile__ ("mcr p15, 0, %0, c7, c10, 5" \
>------->------->-------: : "r" (0) : "memory");
#elif __ARM_ARCH >= 7 || defined __aarch64__

>-------__asm__ __volatile__ ("dmb st" : : : "memory");
#else
>-------__asm__ __volatile__ ("" : : : "memory");
#endif

#elif defined __OCTEON__

>-------__asm__  __volatile__ ("syncws\n" : : : "memory");

#else
>-------__sync_synchronize();
#endif


How do we accommodate this without resorting to architecture specific
processor flags / assembly?

I would prefer to see platform/linux-generic/include/api/odp_sync.h changed
to

platform/linux-generic/include/api/x86/odp_sync.h
platform/linux-generic/include/api/arm/odp_sync.h
platform/linux-generic/include/api/octeon/odp_sync.h

This assumes that there is currently no pure c code that can be used that
is acceptable on all three platforms.

Or since linux-generic is not a performance platform but rather a reference
whenever we need to make a trade off between simplicity and performance, we
alter this to be a regular function and move to:-

platform/linux-generic/include/api/odp_sync.h <- now a pure forward
declaration of the API with doxygen documentation
and implementations correctly broken down by arch

arch/x86/odp_sync.h
arch/arm/odp_sync.h
arch/octeon/odp_sync.h



> Maxim.
>
>
>
>>     On Tue, Dec 23, 2014 at 12:38 PM, Ola Liljedahl
>>     <[email protected] <mailto:[email protected]>> wrote:
>>
>>         I had some discussions with Bala and we think the GCC
>>         generates the
>>         appropriate code for OCTEON so there should be no need for any
>>         OCTEON-specific inline assembler in odp_atomic.h. I just
>>         didn't want
>>         to remove this piece of code when I redesigned the atomic support
>>         because I didn't understand why there is a SYNCW instruction
>>         *before*
>>         the load. Load-acquire type of loads may need sync or barrier
>>         instructions but they come after the load, not before.
>>
>>         I think we should trust GCC to generate better code for OCTEON
>>         than us
>>         mere humans, at least until proven wrong. So please remove this
>>         OCTEON-specific snippet.
>>
>>         -- Ola
>>
>>
>>         On 23 December 2014 at 18:01, Mike Holmes
>>         <[email protected] <mailto:[email protected]>> wrote:
>>         >
>>         >
>>         > On 23 December 2014 at 11:15, Taras Kondratiuk
>>         <[email protected] <mailto:[email protected]
>> >>
>>
>>         > wrote:
>>         >>
>>         >> On 12/23/2014 05:28 PM, Mike Holmes wrote:
>>         >> > how will this work, Octeon does not care about
>>          __atomic_fetch_add(&v,
>>         >> > 1, __ATOMIC_RELAXED); and yet you will fail configure I
>>         assume
>>         >> >
>>         >> > static inline uint32_t
>>         odp_atomic_fetch_inc_u32(odp_atomic_u32_t *atom)
>>         >> > {
>>         >> > #if defined __OCTEON__
>>         >> >>-------uint32_t ret;
>>         >> >>-------__asm__ __volatile__ ("syncws");
>>         >> >>-------__asm__ __volatile__ ("lai %0,(%2)" : "=r" (ret),
>>         "+m" (atom) :
>>         >> >>------->------->------- "r" (atom));
>>         >> >>-------return ret;
>>         >> > #else
>>         >> >>-------return __atomic_fetch_add(&atom->v, 1,
>>         __ATOMIC_RELAXED);
>>         >> > #endif
>>         >> > }
>>         >>
>>         >> Linux-generic shouldn't have arch specific parts, because it is
>>         >> *generic*. Instead it can fall back to strong __sync
>>         functions if
>>         >> __atomic are not available.
>>         >>
>>         >
>>         > Linux kernel is generic but has core specifics - is this not
>>         the same ?
>>         >
>>         > What about these cases ?
>>         >
>>         > platform/linux-generic/odp_system_info.c:static int
>>         cpuinfo_octeon(FILE
>>         > *file, odp_system_info_t *sysinfo)
>>         > platform/linux-generic/odp_system_info.c:    .cpu_arch_str =
>>         "octeon",
>>         > platform/linux-generic/odp_system_info.c:    .cpuinfo_parser =
>>         > cpuinfo_octeon
>>         > platform/linux-generic/include/api/odp_align.h:#if defined
>>         __x86_64__ ||
>>         > defined __i386__
>>         > platform/linux-generic/include/api/odp_sync.h:#if defined
>>         __x86_64__ ||
>>         > defined __i386__
>>         > platform/linux-generic/include/odp_spin_internal.h:#if
>>         defined __x86_64__ ||
>>         > defined __i386__
>>         > platform/linux-generic/odp_system_info.c:#if defined
>>         __x86_64__ || defined
>>         > __i386__ || defined __OCTEON__ || \
>>         > platform/linux-generic/odp_system_info.c:#if defined
>>         __x86_64__ || defined
>>         > __i386__
>>         > platform/linux-generic/odp_system_info.c:static int
>>         cpuinfo_x86(FILE *file,
>>         > odp_system_info_t *sysinfo)
>>         > platform/linux-generic/odp_system_info.c:    #if defined
>>         __x86_64__ ||
>>         > defined __i386__
>>         > platform/linux-generic/odp_system_info.c:    .cpu_arch_str =
>>         "x86",
>>         > platform/linux-generic/odp_system_info.c:    .cpuinfo_parser =
>>         > cpuinfo_x86
>>         > platform/linux-generic/odp_system_info.c:#if defined
>>         __x86_64__ || defined
>>         > __i386__ || defined __OCTEON__ || \
>>         > platform/linux-generic/odp_time.c:#if defined __x86_64__ ||
>>         defined __i386__
>>         > platform/linux-generic/include/api/odp_align.h:#elif defined
>>         __arm__ ||
>>         > defined __aarch64__
>>         > platform/linux-generic/include/api/odp_sync.h:#elif
>>         defined(__arm__)
>>         > platform/linux-generic/include/odp_spin_internal.h:#elif
>>         defined __arm__
>>         > platform/linux-generic/odp_system_info.c:#elif defined
>>         __arm__ || defined
>>         > __aarch64__
>>         > platform/linux-generic/odp_system_info.c:static int
>>         cpuinfo_arm(FILE *file
>>         > ODP_UNUSED,
>>         > platform/linux-generic/odp_system_info.c:    #elif defined
>>         __arm__ ||
>>         > defined __aarch64__
>>         > platform/linux-generic/odp_system_info.c:    .cpu_arch_str =
>>         "arm",
>>         > platform/linux-generic/odp_system_info.c:    .cpuinfo_parser =
>>         > cpuinfo_arm
>>         >
>>         >
>>         >>
>>         >>
>>         >> Taras Kondratiuk
>>         >
>>         >
>>         >
>>         >
>>         > --
>>         > Mike Holmes
>>         > Linaro  Sr Technical Manager
>>         > LNG - ODP
>>
>>         _______________________________________________
>>         lng-odp mailing list
>>         [email protected] <mailto:[email protected]>
>>         http://lists.linaro.org/mailman/listinfo/lng-odp
>>
>>
>>
>>
>> --
>> *Mike Holmes*
>> Linaro  Sr Technical Manager
>> LNG - ODP
>>
>>
>> _______________________________________________
>> lng-odp mailing list
>> [email protected]
>> http://lists.linaro.org/mailman/listinfo/lng-odp
>>
>
>
> _______________________________________________
> lng-odp mailing list
> [email protected]
> http://lists.linaro.org/mailman/listinfo/lng-odp
>



-- 
*Mike Holmes*
Linaro  Sr Technical Manager
LNG - ODP
_______________________________________________
lng-odp mailing list
[email protected]
http://lists.linaro.org/mailman/listinfo/lng-odp

Reply via email to