Re: [PATCH RFC 0/3] API for 128-bit IO access
On Fri, Jan 26, 2018 at 06:11:49PM +, Will Deacon wrote: > On Fri, Jan 26, 2018 at 12:05:42PM +0300, Yury Norov wrote: > > On Wed, Jan 24, 2018 at 10:22:13AM +, Will Deacon wrote: > > > On Wed, Jan 24, 2018 at 12:05:16PM +0300, Yury Norov wrote: > > > > This series adds API for 128-bit memory IO access and enables it for > > > > ARM64. > > > > The original motivation for 128-bit API came from new Cavium network > > > > device > > > > driver. The hardware requires 128-bit access to make things work. See > > > > description in patch 3 for details. > > > > > > > > Also, starting from ARMv8.4, stp and ldp instructions become atomic, and > > > > API for 128-bit access would be helpful in core arm64 code. > > > > > > Only for normal, cacheable memory, so they're not suitable for IO accesses > > > as you're proposing here. > > > > Hi Will, > > > > Thanks for clarification. > > > > Could you elaborate, do you find 128-bit read/write API useless, or > > you just correct my comment? > > > > I think that ordered uniform 128-bit access API would be helpful, even > > if not atomic. > > Sorry, but I strongly disagree here. Having an IO accessor that isn't > guaranteed to be atomic is a recipe for disaster if it's not called out > explicitly. You're much better off implementing something along the lines > of using 2x64-bit accessors like we already > have for the 2x32-bit case. > > However, that doesn't solve your problem and is somewhat of a distraction. > I'd suggest that in your case, where you have a device that relies on > 128-bit atomic access that is assumedly tightly integrated into your SoC, > then the driver just codes it's own local implementation of the accessor, > given that there isn't a way to guarantee the atomicity architecturally > (and even within your SoC it might not be atomic to all endpoints). OK. Understand that. So we'll drop this RFC and implement those accessors in driver. Thank you and all for the review. Maybe later I'll submit 128-bit unification patch that was discussed here. Yury
Re: [PATCH RFC 0/3] API for 128-bit IO access
On Fri, Jan 26, 2018 at 12:05:42PM +0300, Yury Norov wrote: > On Wed, Jan 24, 2018 at 10:22:13AM +, Will Deacon wrote: > > On Wed, Jan 24, 2018 at 12:05:16PM +0300, Yury Norov wrote: > > > This series adds API for 128-bit memory IO access and enables it for > > > ARM64. > > > The original motivation for 128-bit API came from new Cavium network > > > device > > > driver. The hardware requires 128-bit access to make things work. See > > > description in patch 3 for details. > > > > > > Also, starting from ARMv8.4, stp and ldp instructions become atomic, and > > > API for 128-bit access would be helpful in core arm64 code. > > > > Only for normal, cacheable memory, so they're not suitable for IO accesses > > as you're proposing here. > > Hi Will, > > Thanks for clarification. > > Could you elaborate, do you find 128-bit read/write API useless, or > you just correct my comment? > > I think that ordered uniform 128-bit access API would be helpful, even > if not atomic. Sorry, but I strongly disagree here. Having an IO accessor that isn't guaranteed to be atomic is a recipe for disaster if it's not called out explicitly. You're much better off implementing something along the lines of using 2x64-bit accessors like we already have for the 2x32-bit case. However, that doesn't solve your problem and is somewhat of a distraction. I'd suggest that in your case, where you have a device that relies on 128-bit atomic access that is assumedly tightly integrated into your SoC, then the driver just codes it's own local implementation of the accessor, given that there isn't a way to guarantee the atomicity architecturally (and even within your SoC it might not be atomic to all endpoints). Will
Re: [PATCH RFC 0/3] API for 128-bit IO access
On Wed, Jan 24, 2018 at 10:22:13AM +, Will Deacon wrote: > On Wed, Jan 24, 2018 at 12:05:16PM +0300, Yury Norov wrote: > > This series adds API for 128-bit memory IO access and enables it for ARM64. > > The original motivation for 128-bit API came from new Cavium network device > > driver. The hardware requires 128-bit access to make things work. See > > description in patch 3 for details. > > > > Also, starting from ARMv8.4, stp and ldp instructions become atomic, and > > API for 128-bit access would be helpful in core arm64 code. > > Only for normal, cacheable memory, so they're not suitable for IO accesses > as you're proposing here. Hi Will, Thanks for clarification. Could you elaborate, do you find 128-bit read/write API useless, or you just correct my comment? I think that ordered uniform 128-bit access API would be helpful, even if not atomic. Yury.
Re: [PATCH RFC 0/3] API for 128-bit IO access
On Thu, Jan 25, 2018 at 12:38 PM, Yury Norovwrote: > On Wed, Jan 24, 2018 at 11:28:55AM +0100, Arnd Bergmann wrote: >> On Wed, Jan 24, 2018 at 10:05 AM, Yury Norov >> wrote: > Thanks for doing this test. Looking at this I realize that this is > not the architecture feature but compiler feature. So if we add > 128-bit interface, it would be reasonable to add it for all targets > that compiled with toolchain supporting 128-bit accesses. > > There's already the option ARCH_SUPPORTS_INT128 that is enabled for > x86_64 in arch/x86/Kconfig and conditionally enabled for arm64 in > arch/arm64/Makefile: > KBUILD_CFLAGS+= $(call cc-ifversion, -ge, 0500, > -DCONFIG_ARCH_SUPPORTS_INT128) > > It is used in include/linux/math64.h and in lib/ubsan.c, not so wide. > > So I find things little messed. Crypto code ignores compilers' ability > to operate with 128-bit numbers. Ubsan and math64 relies on compiler > version (at least for arm64, and I doubt it would work correctly with clang). > And now I introduce HAVE_128BIT_ACCESS with the same meaning for memory > access. > > I think it's time to unify 128-bit code: > - enable CONFIG_ARCH_SUPPORTS_INT128 if compiler supports it, ie check >it like you do below; > - declare u128 as structure or typedef depending on ARCH_SUPPORTS_INT128 >in generic include/linux/int128.h, as you suggest here; > - switch this series to ARCH_SUPPORTS_INT128. > > Does it sound reasonable? The CONFIG_* symbol namespace should not be set dynamically. However, you can define a symbol with another name, e.g. ARCH_SUPPORTS_INT128 (without CONFIG_ prefix) in linux/compiler-gcc.h based on the version and BITS_PER_LONG. Arnd
Re: [PATCH RFC 0/3] API for 128-bit IO access
On 25/01/18 11:38, Yury Norov wrote: On Wed, Jan 24, 2018 at 11:28:55AM +0100, Arnd Bergmann wrote: On Wed, Jan 24, 2018 at 10:05 AM, Yury Norovwrote: This series adds API for 128-bit memory IO access and enables it for ARM64. The original motivation for 128-bit API came from new Cavium network device driver. The hardware requires 128-bit access to make things work. See description in patch 3 for details. We might also want to do something similar to the include/linux/io-64-nonatomic-lo-hi.h and hi-lo.h files, to simulate 128-bit access using pairs of 64-bit access on other targets. It's apparently driver specific which half you need to do first to make it work, so we need both. OK, will do. Also, starting from ARMv8.4, stp and ldp instructions become atomic, and API for 128-bit access would be helpful in core arm64 code. This series is RFC. I'd like to collect opinions on idea and implementation details. * I didn't implement all 128-bit operations existing for 64-bit variables and other types (__swab128p etc). Do we need them all right now, or we can add them when actually needed? I think in this case it's better to do them all at once. Ack. * u128 name is already used in crypto code. So here I use __uint128_t that comes from GCC for 128-bit types. Should I rename existing type in crypto and make core code for 128-bit variables consistent with u64, u32 etc? (I think yes, but would like to ask crypto people for it.) Hmm, that file probably predates the __uint128_t support. My guess would be that the crypto code using it can actually benefit from the new types as well, so maybe move the existing file to include/linux/int128.h and add an #if/#else logic to it so we use 'typedef __uint128_t __u128' if that is available. It sounds reasonable. But I worry about testing that changes. Hope, crypto community will help with it. * Some compilers don't support __uint128_t, so I protected all generic code with config option HAVE_128BIT_ACCESS. I think it's OK, but... That would be nicely solved by using the #if/#else definition above. * For 128-bit read/write functions I take suffix 'o', which means read/write the octet of bytes. Is this name OK? Can't think of anything better. It's not an octet though, but 16 bytes ('q' is for quadword, meaning four 16-bit words in Intel terminology). Ah, sure. Octet of words. Will change it. * my mips-linux-gnu-gcc v6.3.0 doesn't support __uint128_t, and I don't have other BE setup on hand, so BE case is formally not tested. BE code for arm64 is looking well though. I've run it through my collection of compilers, it seems that most but not all 64-bit targets support it (exceptions appear to be older versions of gcc for s390x and parisc), and none of the 32-bit targets do: Thanks for doing this test. Looking at this I realize that this is not the architecture feature but compiler feature. So if we add 128-bit interface, it would be reasonable to add it for all targets that compiled with toolchain supporting 128-bit accesses. There's already the option ARCH_SUPPORTS_INT128 that is enabled for x86_64 in arch/x86/Kconfig and conditionally enabled for arm64 in arch/arm64/Makefile: KBUILD_CFLAGS+= $(call cc-ifversion, -ge, 0500, -DCONFIG_ARCH_SUPPORTS_INT128) It is used in include/linux/math64.h and in lib/ubsan.c, not so wide. So I find things little messed. Crypto code ignores compilers' ability to operate with 128-bit numbers. Ubsan and math64 relies on compiler version (at least for arm64, and I doubt it would work correctly with clang). And now I introduce HAVE_128BIT_ACCESS with the same meaning for memory access. I think it's time to unify 128-bit code: - enable CONFIG_ARCH_SUPPORTS_INT128 if compiler supports it, ie check it like you do below; - declare u128 as structure or typedef depending on ARCH_SUPPORTS_INT128 in generic include/linux/int128.h, as you suggest here; - switch this series to ARCH_SUPPORTS_INT128. Does it sound reasonable? It probably is about time to formalise the current scattered fragments of uint_128_t support, but to reiterate Will's comment it is almost certainly not worth the effort to implement 'generic' I/O accessors which only work under implementation-defined and undiscoverable hardware conditions, and will be unusable on the overwhelming majority of systems. Just open-code LDP/STP accessors in the one driver which needs them and (by definition) only runs on SoCs where they *are* known to work correctly. Robin. Yury $ for i in /home/arnd/cross-gcc/bin/*gcc-[3-8]* ; do echo -n $i" " ; echo '__uint128_t v;' | $i -xc -S - -o /dev/null && echo ok ; done /home/arnd/cross-gcc/bin/aarch64-linux-gcc-4.8.5 ok /home/arnd/cross-gcc/bin/aarch64-linux-gcc-4.9.3 ok /home/arnd/cross-gcc/bin/aarch64-linux-gcc-4.9.4 ok /home/arnd/cross-gcc/bin/aarch64-linux-gcc-5.2.1 ok /home/arnd/cross-gcc/bin/aarch64-linux-gcc-5.4.1 ok
Re: [PATCH RFC 0/3] API for 128-bit IO access
On Wed, Jan 24, 2018 at 11:28:55AM +0100, Arnd Bergmann wrote: > On Wed, Jan 24, 2018 at 10:05 AM, Yury Norov> wrote: > > This series adds API for 128-bit memory IO access and enables it for ARM64. > > The original motivation for 128-bit API came from new Cavium network device > > driver. The hardware requires 128-bit access to make things work. See > > description in patch 3 for details. > > We might also want to do something similar to the > include/linux/io-64-nonatomic-lo-hi.h > and hi-lo.h files, to simulate 128-bit access using pairs of 64-bit access on > other targets. It's apparently driver specific which half you need to do first > to make it work, so we need both. OK, will do. > > Also, starting from ARMv8.4, stp and ldp instructions become atomic, and > > API for 128-bit access would be helpful in core arm64 code. > > > > This series is RFC. I'd like to collect opinions on idea and implementation > > details. > > * I didn't implement all 128-bit operations existing for 64-bit variables > > and other types (__swab128p etc). Do we need them all right now, or we > > can add them when actually needed? > > I think in this case it's better to do them all at once. Ack. > > * u128 name is already used in crypto code. So here I use __uint128_t that > > comes from GCC for 128-bit types. Should I rename existing type in crypto > > and make core code for 128-bit variables consistent with u64, u32 etc? (I > > think yes, but would like to ask crypto people for it.) > > Hmm, that file probably predates the __uint128_t support. My guess would > be that the crypto code using it can actually benefit from the new types as > well, so maybe move the existing file to include/linux/int128.h and add > an #if/#else logic to it so we use 'typedef __uint128_t __u128' if that > is available. It sounds reasonable. But I worry about testing that changes. Hope, crypto community will help with it. > > * Some compilers don't support __uint128_t, so I protected all generic code > > with config option HAVE_128BIT_ACCESS. I think it's OK, but... > > That would be nicely solved by using the #if/#else definition above. > > > * For 128-bit read/write functions I take suffix 'o', which means read/write > > the octet of bytes. Is this name OK? > > Can't think of anything better. It's not an octet though, but 16 bytes > ('q' is for quadword, meaning four 16-bit words in Intel terminology). Ah, sure. Octet of words. Will change it. > > * my mips-linux-gnu-gcc v6.3.0 doesn't support __uint128_t, and I > > don't have other BE setup on hand, so BE case is formally not tested. > > BE code for arm64 is looking well though. > > I've run it through my collection of compilers, it seems that most but not > all 64-bit targets support it (exceptions appear to be older versions of > gcc for s390x and parisc), and none of the 32-bit targets do: Thanks for doing this test. Looking at this I realize that this is not the architecture feature but compiler feature. So if we add 128-bit interface, it would be reasonable to add it for all targets that compiled with toolchain supporting 128-bit accesses. There's already the option ARCH_SUPPORTS_INT128 that is enabled for x86_64 in arch/x86/Kconfig and conditionally enabled for arm64 in arch/arm64/Makefile: KBUILD_CFLAGS+= $(call cc-ifversion, -ge, 0500, -DCONFIG_ARCH_SUPPORTS_INT128) It is used in include/linux/math64.h and in lib/ubsan.c, not so wide. So I find things little messed. Crypto code ignores compilers' ability to operate with 128-bit numbers. Ubsan and math64 relies on compiler version (at least for arm64, and I doubt it would work correctly with clang). And now I introduce HAVE_128BIT_ACCESS with the same meaning for memory access. I think it's time to unify 128-bit code: - enable CONFIG_ARCH_SUPPORTS_INT128 if compiler supports it, ie check it like you do below; - declare u128 as structure or typedef depending on ARCH_SUPPORTS_INT128 in generic include/linux/int128.h, as you suggest here; - switch this series to ARCH_SUPPORTS_INT128. Does it sound reasonable? Yury > $ for i in /home/arnd/cross-gcc/bin/*gcc-[3-8]* ; do echo -n $i" " ; > echo '__uint128_t v;' | $i -xc -S - -o /dev/null && echo ok ; done > /home/arnd/cross-gcc/bin/aarch64-linux-gcc-4.8.5 ok > /home/arnd/cross-gcc/bin/aarch64-linux-gcc-4.9.3 ok > /home/arnd/cross-gcc/bin/aarch64-linux-gcc-4.9.4 ok > /home/arnd/cross-gcc/bin/aarch64-linux-gcc-5.2.1 ok > /home/arnd/cross-gcc/bin/aarch64-linux-gcc-5.4.1 ok > /home/arnd/cross-gcc/bin/aarch64-linux-gcc-5.5.0 ok > /home/arnd/cross-gcc/bin/aarch64-linux-gcc-6.3.1 ok > /home/arnd/cross-gcc/bin/aarch64-linux-gcc-7.0.0 ok > /home/arnd/cross-gcc/bin/aarch64-linux-gcc-7.0.1 ok > /home/arnd/cross-gcc/bin/aarch64-linux-gcc-7.1.1 ok > /home/arnd/cross-gcc/bin/aarch64-linux-gcc-7.2.1 ok > /home/arnd/cross-gcc/bin/aarch64-linux-gcc-8.0.0 ok > /home/arnd/cross-gcc/bin/alpha-linux-gcc-4.1.3 ok >
Re: [PATCH RFC 0/3] API for 128-bit IO access
On Wed, Jan 24, 2018 at 4:05 AM, Yury Norovwrote: > > ... > With all that, this example code: > > static int __init 128bit_test(void) > { > __uint128_t v; > __uint128_t addr; > __uint128_t val = (__uint128_t) 0x1234567890abc; > ... In case it matters, you can check for GCC support of the 128-bit types in userland with: #if (__SIZEOF_INT128__ >= 16) ... #endif Also see https://gcc.gnu.org/ml/gcc-help/2015-08/msg00185.html . Jeff
Re: [PATCH RFC 0/3] API for 128-bit IO access
On Wed, Jan 24, 2018 at 12:28 PM, Arnd Bergmannwrote: > On Wed, Jan 24, 2018 at 10:05 AM, Yury Norov > wrote: >> * For 128-bit read/write functions I take suffix 'o', which means read/write >> the octet of bytes. Is this name OK? > > Can't think of anything better. It's not an octet though, but 16 bytes > ('q' is for quadword, meaning four 16-bit words in Intel terminology). It's apparently follows Intel's terminology by implying "word", so, "octetword". -- With Best Regards, Andy Shevchenko
Re: [PATCH RFC 0/3] API for 128-bit IO access
On Wed, Jan 24, 2018 at 10:05 AM, Yury Norovwrote: > This series adds API for 128-bit memory IO access and enables it for ARM64. > The original motivation for 128-bit API came from new Cavium network device > driver. The hardware requires 128-bit access to make things work. See > description in patch 3 for details. We might also want to do something similar to the include/linux/io-64-nonatomic-lo-hi.h and hi-lo.h files, to simulate 128-bit access using pairs of 64-bit access on other targets. It's apparently driver specific which half you need to do first to make it work, so we need both. > Also, starting from ARMv8.4, stp and ldp instructions become atomic, and > API for 128-bit access would be helpful in core arm64 code. > > This series is RFC. I'd like to collect opinions on idea and implementation > details. > * I didn't implement all 128-bit operations existing for 64-bit variables > and other types (__swab128p etc). Do we need them all right now, or we > can add them when actually needed? I think in this case it's better to do them all at once. > * u128 name is already used in crypto code. So here I use __uint128_t that > comes from GCC for 128-bit types. Should I rename existing type in crypto > and make core code for 128-bit variables consistent with u64, u32 etc? (I > think yes, but would like to ask crypto people for it.) Hmm, that file probably predates the __uint128_t support. My guess would be that the crypto code using it can actually benefit from the new types as well, so maybe move the existing file to include/linux/int128.h and add an #if/#else logic to it so we use 'typedef __uint128_t __u128' if that is available. > * Some compilers don't support __uint128_t, so I protected all generic code > with config option HAVE_128BIT_ACCESS. I think it's OK, but... That would be nicely solved by using the #if/#else definition above. > * For 128-bit read/write functions I take suffix 'o', which means read/write > the octet of bytes. Is this name OK? Can't think of anything better. It's not an octet though, but 16 bytes ('q' is for quadword, meaning four 16-bit words in Intel terminology). > * my mips-linux-gnu-gcc v6.3.0 doesn't support __uint128_t, and I > don't have other BE setup on hand, so BE case is formally not tested. > BE code for arm64 is looking well though. I've run it through my collection of compilers, it seems that most but not all 64-bit targets support it (exceptions appear to be older versions of gcc for s390x and parisc), and none of the 32-bit targets do: $ for i in /home/arnd/cross-gcc/bin/*gcc-[3-8]* ; do echo -n $i" " ; echo '__uint128_t v;' | $i -xc -S - -o /dev/null && echo ok ; done /home/arnd/cross-gcc/bin/aarch64-linux-gcc-4.8.5 ok /home/arnd/cross-gcc/bin/aarch64-linux-gcc-4.9.3 ok /home/arnd/cross-gcc/bin/aarch64-linux-gcc-4.9.4 ok /home/arnd/cross-gcc/bin/aarch64-linux-gcc-5.2.1 ok /home/arnd/cross-gcc/bin/aarch64-linux-gcc-5.4.1 ok /home/arnd/cross-gcc/bin/aarch64-linux-gcc-5.5.0 ok /home/arnd/cross-gcc/bin/aarch64-linux-gcc-6.3.1 ok /home/arnd/cross-gcc/bin/aarch64-linux-gcc-7.0.0 ok /home/arnd/cross-gcc/bin/aarch64-linux-gcc-7.0.1 ok /home/arnd/cross-gcc/bin/aarch64-linux-gcc-7.1.1 ok /home/arnd/cross-gcc/bin/aarch64-linux-gcc-7.2.1 ok /home/arnd/cross-gcc/bin/aarch64-linux-gcc-8.0.0 ok /home/arnd/cross-gcc/bin/alpha-linux-gcc-4.1.3 ok /home/arnd/cross-gcc/bin/alpha-linux-gcc-4.3.6 ok /home/arnd/cross-gcc/bin/alpha-linux-gcc-4.9.3 ok /home/arnd/cross-gcc/bin/alpha-linux-gcc-7.2.1 ok /home/arnd/cross-gcc/bin/am33_2.0-linux-gcc-4.9.3 :1:1: error: unknown type name '__uint128_t' /home/arnd/cross-gcc/bin/am33_2.0-linux-gcc-5.2.1 :1:1: error: unknown type name '__uint128_t' /home/arnd/cross-gcc/bin/am33_2.0-linux-gcc-7.2.1 :1:1: error: unknown type name '__uint128_t'; did you mean '__int128'? /home/arnd/cross-gcc/bin/arc-elf-gcc-7.2.1 :1:1: error: unknown type name '__uint128_t'; did you mean '__int128'? /home/arnd/cross-gcc/bin/arm-linux-gnueabi-gcc-3.4.6 ok /home/arnd/cross-gcc/bin/arm-linux-gnueabi-gcc-4.3.6 :1: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'v' /home/arnd/cross-gcc/bin/arm-linux-gnueabi-gcc-4.4.7 :1: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'v' /home/arnd/cross-gcc/bin/arm-linux-gnueabi-gcc-4.5.4 :1:13: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'v' /home/arnd/cross-gcc/bin/arm-linux-gnueabi-gcc-4.6.4 :1:1: error: unknown type name '__uint128_t' /home/arnd/cross-gcc/bin/arm-linux-gnueabi-gcc-4.7.4 :1:1: error: unknown type name '__uint128_t' /home/arnd/cross-gcc/bin/arm-linux-gnueabi-gcc-4.8.5 :1:1: error: unknown type name '__uint128_t' /home/arnd/cross-gcc/bin/arm-linux-gnueabi-gcc-4.9.0 :1:1: error: unknown type name '__uint128_t' /home/arnd/cross-gcc/bin/arm-linux-gnueabi-gcc-4.9.1 :1:1: error: unknown type name '__uint128_t' /home/arnd/cross-gcc/bin/arm-linux-gnueabi-gcc-4.9.2 :1:1: error: unknown type name '__uint128_t'
Re: [PATCH RFC 0/3] API for 128-bit IO access
On Wed, Jan 24, 2018 at 12:05:16PM +0300, Yury Norov wrote: > This series adds API for 128-bit memory IO access and enables it for ARM64. > The original motivation for 128-bit API came from new Cavium network device > driver. The hardware requires 128-bit access to make things work. See > description in patch 3 for details. > > Also, starting from ARMv8.4, stp and ldp instructions become atomic, and > API for 128-bit access would be helpful in core arm64 code. Only for normal, cacheable memory, so they're not suitable for IO accesses as you're proposing here. Will
[PATCH RFC 0/3] API for 128-bit IO access
Hi all, This series adds API for 128-bit memory IO access and enables it for ARM64. The original motivation for 128-bit API came from new Cavium network device driver. The hardware requires 128-bit access to make things work. See description in patch 3 for details. Also, starting from ARMv8.4, stp and ldp instructions become atomic, and API for 128-bit access would be helpful in core arm64 code. This series is RFC. I'd like to collect opinions on idea and implementation details. * I didn't implement all 128-bit operations existing for 64-bit variables and other types (__swab128p etc). Do we need them all right now, or we can add them when actually needed? * u128 name is already used in crypto code. So here I use __uint128_t that comes from GCC for 128-bit types. Should I rename existing type in crypto and make core code for 128-bit variables consistent with u64, u32 etc? (I think yes, but would like to ask crypto people for it.) * Some compilers don't support __uint128_t, so I protected all generic code with config option HAVE_128BIT_ACCESS. I think it's OK, but... * For 128-bit read/write functions I take suffix 'o', which means read/write the octet of bytes. Is this name OK? * my mips-linux-gnu-gcc v6.3.0 doesn't support __uint128_t, and I don't have other BE setup on hand, so BE case is formally not tested. BE code for arm64 is looking well though. With all that, this example code: static int __init 128bit_test(void) { __uint128_t v; __uint128_t addr; __uint128_t val = (__uint128_t) 0x1234567890abc; val |= ((__uint128_t) 0xdeadbeaf) << 64; writeo(val, ); v = reado(); pr_err("%llx%llx\n", (u64) (val >> 64), (u64) val); pr_err("%llx%llx\n", (u64) (v >> 64), (u64) v); return v != val; } Generates this listing for arm64-le: <128bit_test>: 0: a9bb7bfdstp x29, x30, [sp, #-80]! 4: 910003fdmov x29, sp 8: a90153f3stp x19, x20, [sp, #16] c: a9025bf5stp x21, x22, [sp, #32] 10: f9001bf7str x23, [sp, #48] 14: d5033e9fdsb st 18: d2815797mov x23, #0xabc // #2748 1c: d297d5f6mov x22, #0xbeaf// #48815 20: f2acf137movkx23, #0x6789, lsl #16 24: f2bbd5b6movkx22, #0xdead, lsl #16 28: f2c468b7movkx23, #0x2345, lsl #32 2c: f2e00037movkx23, #0x1, lsl #48 30: a9045bb7stp x23, x22, [x29, #64] 34: a94453b3ldp x19, x20, [x29, #64] 38: d5033d9fdsb ld 3c: 9015adrpx21, 0 <128bit_test> 40: 910002b5add x21, x21, #0x0 44: aa1703e2mov x2, x23 48: aa1603e1mov x1, x22 4c: aa1503e0mov x0, x21 50: 9400bl 0 54: aa1303e2mov x2, x19 58: aa1403e1mov x1, x20 5c: ca170273eor x19, x19, x23 60: ca160294eor x20, x20, x22 64: aa1503e0mov x0, x21 68: aa140273orr x19, x19, x20 6c: 9400bl 0 70: f9401bf7ldr x23, [sp, #48] 74: f100027fcmp x19, #0x0 78: a94153f3ldp x19, x20, [sp, #16] 7c: 1a9f07e0csetw0, ne // ne = any 80: a9425bf5ldp x21, x22, [sp, #32] 84: a8c57bfdldp x29, x30, [sp], #80 88: d65f03c0ret And for arm64-be: <128bit_test>: 0: a9bb7bfdstp x29, x30, [sp, #-80]! 4: 910003fdmov x29, sp 8: a90153f3stp x19, x20, [sp, #16] c: a9025bf5stp x21, x22, [sp, #32] 10: f9001bf7str x23, [sp, #48] 14: d5033e9fdsb st 18: d2802001mov x1, #0x100 // #256 1c: d2d5bbc0mov x0, #0xadde // #191168994344960 20: f2a8a461movkx1, #0x4523, lsl #16 24: f2f5f7c0movkx0, #0xafbe, lsl #48 28: f2d12ce1movkx1, #0x8967, lsl #32 2c: f2f78141movkx1, #0xbc0a, lsl #48 30: a90407a0stp x0, x1, [x29, #64] 34: a94453b3ldp x19, x20, [x29, #64] 38: dac00e73rev x19, x19 3c: dac00e94rev x20, x20 40: d5033d9fdsb ld 44: d2815796mov x22, #0xabc // #2748 48: 9015adrpx21, 0 <128bit_test> 4c: f2acf136movkx22, #0x6789, lsl #16 50: 910002b5add x21, x21, #0x0 54: f2c468b6movkx22, #0x2345, lsl #32 58: d297d5f7mov x23, #0xbeaf// #48815 5c: f2e00036movkx22, #0x1, lsl #48 60: f2bbd5b7movkx23, #0xdead, lsl #16 64: aa1603e2mov x2, x22 68: aa1703e1mov x1, x23 6c: