Re: [PATCH RFC 0/3] API for 128-bit IO access

2018-01-29 Thread Yury Norov
On Fri, Jan 26, 2018 at 06:11:49PM +, Will Deacon wrote:
> On Fri, Jan 26, 2018 at 12:05:42PM +0300, Yury Norov wrote:
> > On Wed, Jan 24, 2018 at 10:22:13AM +, Will Deacon wrote:
> > > On Wed, Jan 24, 2018 at 12:05:16PM +0300, Yury Norov wrote:
> > > > This series adds API for 128-bit memory IO access and enables it for 
> > > > ARM64.
> > > > The original motivation for 128-bit API came from new Cavium network 
> > > > device
> > > > driver. The hardware requires 128-bit access to make things work. See
> > > > description in patch 3 for details.
> > > > 
> > > > Also, starting from ARMv8.4, stp and ldp instructions become atomic, and
> > > > API for 128-bit access would be helpful in core arm64 code.
> > > 
> > > Only for normal, cacheable memory, so they're not suitable for IO accesses
> > > as you're proposing here.
> > 
> > Hi Will,
> > 
> > Thanks for clarification.
> > 
> > Could you elaborate, do you find 128-bit read/write API useless, or
> > you just correct my comment?
> > 
> > I think that ordered uniform 128-bit access API would be helpful, even
> > if not atomic.
> 
> Sorry, but I strongly disagree here. Having an IO accessor that isn't
> guaranteed to be atomic is a recipe for disaster if it's not called out
> explicitly. You're much better off implementing something along the lines
> of  using 2x64-bit accessors like we already
> have for the 2x32-bit case.
> 
> However, that doesn't solve your problem and is somewhat of a distraction.
> I'd suggest that in your case, where you have a device that relies on
> 128-bit atomic access that is assumedly tightly integrated into your SoC,
> then the driver just codes it's own local implementation of the accessor,
> given that there isn't a way to guarantee the atomicity architecturally
> (and even within your SoC it might not be atomic to all endpoints).

OK. Understand that. So we'll drop this RFC and implement those accessors 
in driver.

Thank you and all for the review. Maybe later I'll submit 128-bit unification
patch that was discussed here.

Yury


Re: [PATCH RFC 0/3] API for 128-bit IO access

2018-01-26 Thread Will Deacon
On Fri, Jan 26, 2018 at 12:05:42PM +0300, Yury Norov wrote:
> On Wed, Jan 24, 2018 at 10:22:13AM +, Will Deacon wrote:
> > On Wed, Jan 24, 2018 at 12:05:16PM +0300, Yury Norov wrote:
> > > This series adds API for 128-bit memory IO access and enables it for 
> > > ARM64.
> > > The original motivation for 128-bit API came from new Cavium network 
> > > device
> > > driver. The hardware requires 128-bit access to make things work. See
> > > description in patch 3 for details.
> > > 
> > > Also, starting from ARMv8.4, stp and ldp instructions become atomic, and
> > > API for 128-bit access would be helpful in core arm64 code.
> > 
> > Only for normal, cacheable memory, so they're not suitable for IO accesses
> > as you're proposing here.
> 
> Hi Will,
> 
> Thanks for clarification.
> 
> Could you elaborate, do you find 128-bit read/write API useless, or
> you just correct my comment?
> 
> I think that ordered uniform 128-bit access API would be helpful, even
> if not atomic.

Sorry, but I strongly disagree here. Having an IO accessor that isn't
guaranteed to be atomic is a recipe for disaster if it's not called out
explicitly. You're much better off implementing something along the lines
of  using 2x64-bit accessors like we already
have for the 2x32-bit case.

However, that doesn't solve your problem and is somewhat of a distraction.
I'd suggest that in your case, where you have a device that relies on
128-bit atomic access that is assumedly tightly integrated into your SoC,
then the driver just codes it's own local implementation of the accessor,
given that there isn't a way to guarantee the atomicity architecturally
(and even within your SoC it might not be atomic to all endpoints).

Will


Re: [PATCH RFC 0/3] API for 128-bit IO access

2018-01-26 Thread Yury Norov
On Wed, Jan 24, 2018 at 10:22:13AM +, Will Deacon wrote:
> On Wed, Jan 24, 2018 at 12:05:16PM +0300, Yury Norov wrote:
> > This series adds API for 128-bit memory IO access and enables it for ARM64.
> > The original motivation for 128-bit API came from new Cavium network device
> > driver. The hardware requires 128-bit access to make things work. See
> > description in patch 3 for details.
> > 
> > Also, starting from ARMv8.4, stp and ldp instructions become atomic, and
> > API for 128-bit access would be helpful in core arm64 code.
> 
> Only for normal, cacheable memory, so they're not suitable for IO accesses
> as you're proposing here.

Hi Will,

Thanks for clarification.

Could you elaborate, do you find 128-bit read/write API useless, or
you just correct my comment?

I think that ordered uniform 128-bit access API would be helpful, even
if not atomic.

Yury.


Re: [PATCH RFC 0/3] API for 128-bit IO access

2018-01-25 Thread Arnd Bergmann
On Thu, Jan 25, 2018 at 12:38 PM, Yury Norov  wrote:
> On Wed, Jan 24, 2018 at 11:28:55AM +0100, Arnd Bergmann wrote:
>> On Wed, Jan 24, 2018 at 10:05 AM, Yury Norov  
>> wrote:

> Thanks for doing this test. Looking at this I realize that this is
> not the architecture feature but compiler feature. So if we add
> 128-bit interface, it would be reasonable to add it for all targets
> that compiled with toolchain supporting 128-bit accesses.
>
> There's already the option ARCH_SUPPORTS_INT128 that is enabled for
> x86_64 in arch/x86/Kconfig and conditionally enabled for arm64 in
> arch/arm64/Makefile:
> KBUILD_CFLAGS+= $(call cc-ifversion, -ge, 0500, 
> -DCONFIG_ARCH_SUPPORTS_INT128)
>
> It is used in include/linux/math64.h and in lib/ubsan.c, not so wide.
>
> So I find things little messed. Crypto code ignores compilers' ability
> to operate with 128-bit numbers. Ubsan and math64 relies on compiler
> version (at least for arm64, and I doubt it would work correctly with clang).
> And now I introduce HAVE_128BIT_ACCESS with the same meaning for memory
> access.
>
> I think it's time to unify 128-bit code:
>  - enable CONFIG_ARCH_SUPPORTS_INT128 if compiler supports it, ie check
>it like you do below;
>  - declare u128 as structure or typedef depending on ARCH_SUPPORTS_INT128
>in generic include/linux/int128.h, as you suggest here;
>  - switch this series to ARCH_SUPPORTS_INT128.
>
> Does it sound reasonable?

The CONFIG_* symbol namespace should not be set dynamically. However, you
can define a symbol with another name, e.g. ARCH_SUPPORTS_INT128 (without
CONFIG_ prefix) in linux/compiler-gcc.h based on the version and BITS_PER_LONG.

 Arnd


Re: [PATCH RFC 0/3] API for 128-bit IO access

2018-01-25 Thread Robin Murphy

On 25/01/18 11:38, Yury Norov wrote:

On Wed, Jan 24, 2018 at 11:28:55AM +0100, Arnd Bergmann wrote:

On Wed, Jan 24, 2018 at 10:05 AM, Yury Norov  wrote:

This series adds API for 128-bit memory IO access and enables it for ARM64.
The original motivation for 128-bit API came from new Cavium network device
driver. The hardware requires 128-bit access to make things work. See
description in patch 3 for details.


We might also want to do something similar to the
include/linux/io-64-nonatomic-lo-hi.h
and hi-lo.h files, to simulate 128-bit access using pairs of 64-bit access on
other targets. It's apparently driver specific which half you need to do first
to make it work, so we need both.


OK, will do.
  

Also, starting from ARMv8.4, stp and ldp instructions become atomic, and
API for 128-bit access would be helpful in core arm64 code.

This series is RFC. I'd like to collect opinions on idea and implementation
details.
* I didn't implement all 128-bit operations existing for 64-bit variables
and other types (__swab128p etc). Do we need them all right now, or we
can add them when actually needed?


I think in this case it's better to do them all at once.


Ack.


* u128 name is already used in crypto code. So here I use __uint128_t that
comes from GCC for 128-bit types. Should I rename existing type in crypto
and make core code for 128-bit variables consistent with u64, u32 etc? (I
think yes, but would like to ask crypto people for it.)


Hmm, that file probably predates the __uint128_t support. My guess would
be that the crypto code using it can actually benefit from the new types as
well, so maybe move the existing file to include/linux/int128.h and add
an #if/#else logic to it so we use 'typedef __uint128_t __u128' if that
is available.


It sounds reasonable. But I worry about testing that changes. Hope,
crypto community will help with it.


* Some compilers don't support __uint128_t, so I protected all generic code
with config option HAVE_128BIT_ACCESS. I think it's OK, but...


That would be nicely solved by using the #if/#else definition above.


* For 128-bit read/write functions I take suffix 'o', which means read/write
the octet of bytes. Is this name OK?


Can't think of anything better. It's not an octet though, but 16 bytes
('q' is for quadword, meaning four 16-bit words in Intel terminology).


Ah, sure. Octet of words. Will change it.


* my mips-linux-gnu-gcc v6.3.0 doesn't support __uint128_t, and I
don't have other BE setup on hand, so BE case is formally not tested.
BE code for arm64 is looking well though.


I've run it through my collection of compilers, it seems that most but not
all 64-bit targets support it (exceptions appear to be older versions of
gcc for s390x and parisc), and none of the 32-bit targets do:


Thanks for doing this test. Looking at this I realize that this is
not the architecture feature but compiler feature. So if we add
128-bit interface, it would be reasonable to add it for all targets
that compiled with toolchain supporting 128-bit accesses.

There's already the option ARCH_SUPPORTS_INT128 that is enabled for
x86_64 in arch/x86/Kconfig and conditionally enabled for arm64 in
arch/arm64/Makefile:
KBUILD_CFLAGS+= $(call cc-ifversion, -ge, 0500, 
-DCONFIG_ARCH_SUPPORTS_INT128)

It is used in include/linux/math64.h and in lib/ubsan.c, not so wide.

So I find things little messed. Crypto code ignores compilers' ability
to operate with 128-bit numbers. Ubsan and math64 relies on compiler
version (at least for arm64, and I doubt it would work correctly with clang).
And now I introduce HAVE_128BIT_ACCESS with the same meaning for memory
access.

I think it's time to unify 128-bit code:
  - enable CONFIG_ARCH_SUPPORTS_INT128 if compiler supports it, ie check
it like you do below;
  - declare u128 as structure or typedef depending on ARCH_SUPPORTS_INT128
in generic include/linux/int128.h, as you suggest here;
  - switch this series to ARCH_SUPPORTS_INT128.

Does it sound reasonable?


It probably is about time to formalise the current scattered fragments 
of uint_128_t support, but to reiterate Will's comment it is almost 
certainly not worth the effort to implement 'generic' I/O accessors 
which only work under implementation-defined and undiscoverable hardware 
conditions, and will be unusable on the overwhelming majority of 
systems. Just open-code LDP/STP accessors in the one driver which needs 
them and (by definition) only runs on SoCs where they *are* known to 
work correctly.


Robin.



Yury


$ for i in /home/arnd/cross-gcc/bin/*gcc-[3-8]* ; do echo -n $i" " ;
echo '__uint128_t v;' | $i -xc -S - -o /dev/null && echo ok ; done
/home/arnd/cross-gcc/bin/aarch64-linux-gcc-4.8.5 ok
/home/arnd/cross-gcc/bin/aarch64-linux-gcc-4.9.3 ok
/home/arnd/cross-gcc/bin/aarch64-linux-gcc-4.9.4 ok
/home/arnd/cross-gcc/bin/aarch64-linux-gcc-5.2.1 ok
/home/arnd/cross-gcc/bin/aarch64-linux-gcc-5.4.1 ok

Re: [PATCH RFC 0/3] API for 128-bit IO access

2018-01-25 Thread Yury Norov
On Wed, Jan 24, 2018 at 11:28:55AM +0100, Arnd Bergmann wrote:
> On Wed, Jan 24, 2018 at 10:05 AM, Yury Norov  
> wrote:
> > This series adds API for 128-bit memory IO access and enables it for ARM64.
> > The original motivation for 128-bit API came from new Cavium network device
> > driver. The hardware requires 128-bit access to make things work. See
> > description in patch 3 for details.
> 
> We might also want to do something similar to the
> include/linux/io-64-nonatomic-lo-hi.h
> and hi-lo.h files, to simulate 128-bit access using pairs of 64-bit access on
> other targets. It's apparently driver specific which half you need to do first
> to make it work, so we need both.

OK, will do.
 
> > Also, starting from ARMv8.4, stp and ldp instructions become atomic, and
> > API for 128-bit access would be helpful in core arm64 code.
> >
> > This series is RFC. I'd like to collect opinions on idea and implementation
> > details.
> > * I didn't implement all 128-bit operations existing for 64-bit variables
> > and other types (__swab128p etc). Do we need them all right now, or we
> > can add them when actually needed?
> 
> I think in this case it's better to do them all at once.

Ack.

> > * u128 name is already used in crypto code. So here I use __uint128_t that
> > comes from GCC for 128-bit types. Should I rename existing type in crypto
> > and make core code for 128-bit variables consistent with u64, u32 etc? (I
> > think yes, but would like to ask crypto people for it.)
> 
> Hmm, that file probably predates the __uint128_t support. My guess would
> be that the crypto code using it can actually benefit from the new types as
> well, so maybe move the existing file to include/linux/int128.h and add
> an #if/#else logic to it so we use 'typedef __uint128_t __u128' if that
> is available.

It sounds reasonable. But I worry about testing that changes. Hope,
crypto community will help with it.

> > * Some compilers don't support __uint128_t, so I protected all generic code
> > with config option HAVE_128BIT_ACCESS. I think it's OK, but...
> 
> That would be nicely solved by using the #if/#else definition above.
> 
> > * For 128-bit read/write functions I take suffix 'o', which means read/write
> > the octet of bytes. Is this name OK?
> 
> Can't think of anything better. It's not an octet though, but 16 bytes
> ('q' is for quadword, meaning four 16-bit words in Intel terminology).

Ah, sure. Octet of words. Will change it.

> > * my mips-linux-gnu-gcc v6.3.0 doesn't support __uint128_t, and I
> > don't have other BE setup on hand, so BE case is formally not tested.
> > BE code for arm64 is looking well though.
> 
> I've run it through my collection of compilers, it seems that most but not
> all 64-bit targets support it (exceptions appear to be older versions of
> gcc for s390x and parisc), and none of the 32-bit targets do:

Thanks for doing this test. Looking at this I realize that this is
not the architecture feature but compiler feature. So if we add
128-bit interface, it would be reasonable to add it for all targets
that compiled with toolchain supporting 128-bit accesses.

There's already the option ARCH_SUPPORTS_INT128 that is enabled for 
x86_64 in arch/x86/Kconfig and conditionally enabled for arm64 in 
arch/arm64/Makefile:
KBUILD_CFLAGS+= $(call cc-ifversion, -ge, 0500, 
-DCONFIG_ARCH_SUPPORTS_INT128)

It is used in include/linux/math64.h and in lib/ubsan.c, not so wide.

So I find things little messed. Crypto code ignores compilers' ability
to operate with 128-bit numbers. Ubsan and math64 relies on compiler
version (at least for arm64, and I doubt it would work correctly with clang).
And now I introduce HAVE_128BIT_ACCESS with the same meaning for memory
access.

I think it's time to unify 128-bit code:
 - enable CONFIG_ARCH_SUPPORTS_INT128 if compiler supports it, ie check
   it like you do below;
 - declare u128 as structure or typedef depending on ARCH_SUPPORTS_INT128
   in generic include/linux/int128.h, as you suggest here;
 - switch this series to ARCH_SUPPORTS_INT128.

Does it sound reasonable?

Yury

> $ for i in /home/arnd/cross-gcc/bin/*gcc-[3-8]* ; do echo -n $i" " ;
> echo '__uint128_t v;' | $i -xc -S - -o /dev/null && echo ok ; done
> /home/arnd/cross-gcc/bin/aarch64-linux-gcc-4.8.5 ok
> /home/arnd/cross-gcc/bin/aarch64-linux-gcc-4.9.3 ok
> /home/arnd/cross-gcc/bin/aarch64-linux-gcc-4.9.4 ok
> /home/arnd/cross-gcc/bin/aarch64-linux-gcc-5.2.1 ok
> /home/arnd/cross-gcc/bin/aarch64-linux-gcc-5.4.1 ok
> /home/arnd/cross-gcc/bin/aarch64-linux-gcc-5.5.0 ok
> /home/arnd/cross-gcc/bin/aarch64-linux-gcc-6.3.1 ok
> /home/arnd/cross-gcc/bin/aarch64-linux-gcc-7.0.0 ok
> /home/arnd/cross-gcc/bin/aarch64-linux-gcc-7.0.1 ok
> /home/arnd/cross-gcc/bin/aarch64-linux-gcc-7.1.1 ok
> /home/arnd/cross-gcc/bin/aarch64-linux-gcc-7.2.1 ok
> /home/arnd/cross-gcc/bin/aarch64-linux-gcc-8.0.0 ok
> /home/arnd/cross-gcc/bin/alpha-linux-gcc-4.1.3 ok
> 

Re: [PATCH RFC 0/3] API for 128-bit IO access

2018-01-24 Thread Jeffrey Walton
On Wed, Jan 24, 2018 at 4:05 AM, Yury Norov  wrote:
>
> ...
> With all that, this example code:
>
> static int __init 128bit_test(void)
> {
> __uint128_t v;
> __uint128_t addr;
> __uint128_t val = (__uint128_t) 0x1234567890abc;
> ...

In case it matters, you can check for GCC support of the 128-bit types
in userland with:

#if (__SIZEOF_INT128__ >= 16)
   ...
#endif

Also see https://gcc.gnu.org/ml/gcc-help/2015-08/msg00185.html .

Jeff


Re: [PATCH RFC 0/3] API for 128-bit IO access

2018-01-24 Thread Andy Shevchenko
On Wed, Jan 24, 2018 at 12:28 PM, Arnd Bergmann  wrote:
> On Wed, Jan 24, 2018 at 10:05 AM, Yury Norov  
> wrote:

>> * For 128-bit read/write functions I take suffix 'o', which means read/write
>> the octet of bytes. Is this name OK?
>
> Can't think of anything better. It's not an octet though, but 16 bytes
> ('q' is for quadword, meaning four 16-bit words in Intel terminology).

It's apparently follows Intel's terminology by implying "word", so, "octetword".

-- 
With Best Regards,
Andy Shevchenko


Re: [PATCH RFC 0/3] API for 128-bit IO access

2018-01-24 Thread Arnd Bergmann
On Wed, Jan 24, 2018 at 10:05 AM, Yury Norov  wrote:
> This series adds API for 128-bit memory IO access and enables it for ARM64.
> The original motivation for 128-bit API came from new Cavium network device
> driver. The hardware requires 128-bit access to make things work. See
> description in patch 3 for details.

We might also want to do something similar to the
include/linux/io-64-nonatomic-lo-hi.h
and hi-lo.h files, to simulate 128-bit access using pairs of 64-bit access on
other targets. It's apparently driver specific which half you need to do first
to make it work, so we need both.

> Also, starting from ARMv8.4, stp and ldp instructions become atomic, and
> API for 128-bit access would be helpful in core arm64 code.
>
> This series is RFC. I'd like to collect opinions on idea and implementation
> details.
> * I didn't implement all 128-bit operations existing for 64-bit variables
> and other types (__swab128p etc). Do we need them all right now, or we
> can add them when actually needed?

I think in this case it's better to do them all at once.

> * u128 name is already used in crypto code. So here I use __uint128_t that
> comes from GCC for 128-bit types. Should I rename existing type in crypto
> and make core code for 128-bit variables consistent with u64, u32 etc? (I
> think yes, but would like to ask crypto people for it.)

Hmm, that file probably predates the __uint128_t support. My guess would
be that the crypto code using it can actually benefit from the new types as
well, so maybe move the existing file to include/linux/int128.h and add
an #if/#else logic to it so we use 'typedef __uint128_t __u128' if that
is available.

> * Some compilers don't support __uint128_t, so I protected all generic code
> with config option HAVE_128BIT_ACCESS. I think it's OK, but...

That would be nicely solved by using the #if/#else definition above.

> * For 128-bit read/write functions I take suffix 'o', which means read/write
> the octet of bytes. Is this name OK?

Can't think of anything better. It's not an octet though, but 16 bytes
('q' is for quadword, meaning four 16-bit words in Intel terminology).

> * my mips-linux-gnu-gcc v6.3.0 doesn't support __uint128_t, and I
> don't have other BE setup on hand, so BE case is formally not tested.
> BE code for arm64 is looking well though.

I've run it through my collection of compilers, it seems that most but not
all 64-bit targets support it (exceptions appear to be older versions of
gcc for s390x and parisc), and none of the 32-bit targets do:

$ for i in /home/arnd/cross-gcc/bin/*gcc-[3-8]* ; do echo -n $i" " ;
echo '__uint128_t v;' | $i -xc -S - -o /dev/null && echo ok ; done
/home/arnd/cross-gcc/bin/aarch64-linux-gcc-4.8.5 ok
/home/arnd/cross-gcc/bin/aarch64-linux-gcc-4.9.3 ok
/home/arnd/cross-gcc/bin/aarch64-linux-gcc-4.9.4 ok
/home/arnd/cross-gcc/bin/aarch64-linux-gcc-5.2.1 ok
/home/arnd/cross-gcc/bin/aarch64-linux-gcc-5.4.1 ok
/home/arnd/cross-gcc/bin/aarch64-linux-gcc-5.5.0 ok
/home/arnd/cross-gcc/bin/aarch64-linux-gcc-6.3.1 ok
/home/arnd/cross-gcc/bin/aarch64-linux-gcc-7.0.0 ok
/home/arnd/cross-gcc/bin/aarch64-linux-gcc-7.0.1 ok
/home/arnd/cross-gcc/bin/aarch64-linux-gcc-7.1.1 ok
/home/arnd/cross-gcc/bin/aarch64-linux-gcc-7.2.1 ok
/home/arnd/cross-gcc/bin/aarch64-linux-gcc-8.0.0 ok
/home/arnd/cross-gcc/bin/alpha-linux-gcc-4.1.3 ok
/home/arnd/cross-gcc/bin/alpha-linux-gcc-4.3.6 ok
/home/arnd/cross-gcc/bin/alpha-linux-gcc-4.9.3 ok
/home/arnd/cross-gcc/bin/alpha-linux-gcc-7.2.1 ok
/home/arnd/cross-gcc/bin/am33_2.0-linux-gcc-4.9.3 :1:1: error:
unknown type name '__uint128_t'
/home/arnd/cross-gcc/bin/am33_2.0-linux-gcc-5.2.1 :1:1: error:
unknown type name '__uint128_t'
/home/arnd/cross-gcc/bin/am33_2.0-linux-gcc-7.2.1 :1:1: error:
unknown type name '__uint128_t'; did you mean '__int128'?
/home/arnd/cross-gcc/bin/arc-elf-gcc-7.2.1 :1:1: error: unknown
type name '__uint128_t'; did you mean '__int128'?
/home/arnd/cross-gcc/bin/arm-linux-gnueabi-gcc-3.4.6 ok
/home/arnd/cross-gcc/bin/arm-linux-gnueabi-gcc-4.3.6 :1: error:
expected '=', ',', ';', 'asm' or '__attribute__' before 'v'
/home/arnd/cross-gcc/bin/arm-linux-gnueabi-gcc-4.4.7 :1: error:
expected '=', ',', ';', 'asm' or '__attribute__' before 'v'
/home/arnd/cross-gcc/bin/arm-linux-gnueabi-gcc-4.5.4 :1:13:
error: expected '=', ',', ';', 'asm' or '__attribute__' before 'v'
/home/arnd/cross-gcc/bin/arm-linux-gnueabi-gcc-4.6.4 :1:1:
error: unknown type name '__uint128_t'
/home/arnd/cross-gcc/bin/arm-linux-gnueabi-gcc-4.7.4 :1:1:
error: unknown type name '__uint128_t'
/home/arnd/cross-gcc/bin/arm-linux-gnueabi-gcc-4.8.5 :1:1:
error: unknown type name '__uint128_t'
/home/arnd/cross-gcc/bin/arm-linux-gnueabi-gcc-4.9.0 :1:1:
error: unknown type name '__uint128_t'
/home/arnd/cross-gcc/bin/arm-linux-gnueabi-gcc-4.9.1 :1:1:
error: unknown type name '__uint128_t'
/home/arnd/cross-gcc/bin/arm-linux-gnueabi-gcc-4.9.2 :1:1:
error: unknown type name '__uint128_t'

Re: [PATCH RFC 0/3] API for 128-bit IO access

2018-01-24 Thread Will Deacon
On Wed, Jan 24, 2018 at 12:05:16PM +0300, Yury Norov wrote:
> This series adds API for 128-bit memory IO access and enables it for ARM64.
> The original motivation for 128-bit API came from new Cavium network device
> driver. The hardware requires 128-bit access to make things work. See
> description in patch 3 for details.
> 
> Also, starting from ARMv8.4, stp and ldp instructions become atomic, and
> API for 128-bit access would be helpful in core arm64 code.

Only for normal, cacheable memory, so they're not suitable for IO accesses
as you're proposing here.

Will


[PATCH RFC 0/3] API for 128-bit IO access

2018-01-24 Thread Yury Norov
Hi all,

This series adds API for 128-bit memory IO access and enables it for ARM64.
The original motivation for 128-bit API came from new Cavium network device
driver. The hardware requires 128-bit access to make things work. See
description in patch 3 for details.

Also, starting from ARMv8.4, stp and ldp instructions become atomic, and
API for 128-bit access would be helpful in core arm64 code.

This series is RFC. I'd like to collect opinions on idea and implementation
details.
* I didn't implement all 128-bit operations existing for 64-bit variables
and other types (__swab128p etc). Do we need them all right now, or we
can add them when actually needed?
* u128 name is already used in crypto code. So here I use __uint128_t that
comes from GCC for 128-bit types. Should I rename existing type in crypto
and make core code for 128-bit variables consistent with u64, u32 etc? (I
think yes, but would like to ask crypto people for it.)
* Some compilers don't support __uint128_t, so I protected all generic code
with config option HAVE_128BIT_ACCESS. I think it's OK, but... 
* For 128-bit read/write functions I take suffix 'o', which means read/write
the octet of bytes. Is this name OK?
* my mips-linux-gnu-gcc v6.3.0 doesn't support __uint128_t, and I
don't have other BE setup on hand, so BE case is formally not tested.
BE code for arm64 is looking well though.

With all that, this example code:

static int __init 128bit_test(void)
{
__uint128_t v;
__uint128_t addr;
__uint128_t val = (__uint128_t) 0x1234567890abc;

val |= ((__uint128_t) 0xdeadbeaf) << 64;

writeo(val, );
v = reado();

pr_err("%llx%llx\n", (u64) (val >> 64), (u64) val);
pr_err("%llx%llx\n", (u64) (v >> 64), (u64) v);
return v != val;
}

Generates this listing for arm64-le:

 <128bit_test>:
   0:   a9bb7bfdstp x29, x30, [sp, #-80]!
   4:   910003fdmov x29, sp
   8:   a90153f3stp x19, x20, [sp, #16]
   c:   a9025bf5stp x21, x22, [sp, #32]
  10:   f9001bf7str x23, [sp, #48]
  14:   d5033e9fdsb st
  18:   d2815797mov x23, #0xabc // #2748
  1c:   d297d5f6mov x22, #0xbeaf// #48815
  20:   f2acf137movkx23, #0x6789, lsl #16
  24:   f2bbd5b6movkx22, #0xdead, lsl #16
  28:   f2c468b7movkx23, #0x2345, lsl #32
  2c:   f2e00037movkx23, #0x1, lsl #48
  30:   a9045bb7stp x23, x22, [x29, #64]
  34:   a94453b3ldp x19, x20, [x29, #64]
  38:   d5033d9fdsb ld
  3c:   9015adrpx21, 0 <128bit_test>
  40:   910002b5add x21, x21, #0x0
  44:   aa1703e2mov x2, x23
  48:   aa1603e1mov x1, x22
  4c:   aa1503e0mov x0, x21
  50:   9400bl  0 
  54:   aa1303e2mov x2, x19
  58:   aa1403e1mov x1, x20
  5c:   ca170273eor x19, x19, x23
  60:   ca160294eor x20, x20, x22
  64:   aa1503e0mov x0, x21
  68:   aa140273orr x19, x19, x20
  6c:   9400bl  0 
  70:   f9401bf7ldr x23, [sp, #48]
  74:   f100027fcmp x19, #0x0
  78:   a94153f3ldp x19, x20, [sp, #16]
  7c:   1a9f07e0csetw0, ne  // ne = any
  80:   a9425bf5ldp x21, x22, [sp, #32]
  84:   a8c57bfdldp x29, x30, [sp], #80
  88:   d65f03c0ret

And for arm64-be:

 <128bit_test>:
   0:   a9bb7bfdstp x29, x30, [sp, #-80]!
   4:   910003fdmov x29, sp
   8:   a90153f3stp x19, x20, [sp, #16]
   c:   a9025bf5stp x21, x22, [sp, #32]
  10:   f9001bf7str x23, [sp, #48]
  14:   d5033e9fdsb st
  18:   d2802001mov x1, #0x100  // #256
  1c:   d2d5bbc0mov x0, #0xadde // 
#191168994344960
  20:   f2a8a461movkx1, #0x4523, lsl #16
  24:   f2f5f7c0movkx0, #0xafbe, lsl #48
  28:   f2d12ce1movkx1, #0x8967, lsl #32
  2c:   f2f78141movkx1, #0xbc0a, lsl #48
  30:   a90407a0stp x0, x1, [x29, #64]
  34:   a94453b3ldp x19, x20, [x29, #64]
  38:   dac00e73rev x19, x19
  3c:   dac00e94rev x20, x20
  40:   d5033d9fdsb ld
  44:   d2815796mov x22, #0xabc // #2748
  48:   9015adrpx21, 0 <128bit_test>
  4c:   f2acf136movkx22, #0x6789, lsl #16
  50:   910002b5add x21, x21, #0x0
  54:   f2c468b6movkx22, #0x2345, lsl #32
  58:   d297d5f7mov x23, #0xbeaf// #48815
  5c:   f2e00036movkx22, #0x1, lsl #48
  60:   f2bbd5b7movkx23, #0xdead, lsl #16
  64:   aa1603e2mov x2, x22
  68:   aa1703e1mov x1, x23
  6c: