Send MinGW-Notify mailing list submissions to
        [email protected]

To subscribe or unsubscribe via the World Wide Web, visit
        https://lists.osdn.me/mailman/listinfo/mingw-notify
or, via email, send a message with subject or body 'help' to
        [email protected]

You can reach the person managing the list at
        [email protected]

When replying, please edit your Subject line so it is more specific
than "Re: Contents of MinGW-Notify digest..."


Please do not reply to this notification; the sender address is unable to 
accept incoming e-mail.  If you wish to unsubscribe you can do so at 
https://lists.osdn.me/mailman/listinfo/mingw-notify.



Today's Topics:

   1. [mingw] #39565: Incorrect alignment of SIMD vectors returned
      by value from functions (MinGW Notification List)
   2. [mingw] #39565: Incorrect alignment of SIMD vectors returned
      by value from functions (MinGW Notification List)


----------------------------------------------------------------------

Message: 1
Date: Tue, 13 Apr 2021 19:29:00 +0100
From: MinGW Notification List <[email protected]>
To: OSDN Ticket System <[email protected]>
Subject: [MinGW-Notify] [mingw] #39565: Incorrect alignment of SIMD
        vectors returned by value from functions
Message-ID: <[email protected]>
Content-Type: text/plain; charset=UTF-8

#39565: Incorrect alignment of SIMD vectors returned by value from functions

  Open Date: 2019-09-12 07:47
Last Update: 2021-04-13 19:29

URL for this Ticket:
    https://osdn.net//projects/mingw/ticket/39565
RSS feed for this Ticket:
    https://osdn.net/ticket/ticket_rss.php?group_id=3917&tid=39565

---------------------------------------------------------------------

Last Changes/Comment on this Ticket:
2021-04-13 19:29 Updated by: keith
 * Status Update from Open to Closed
 * Resolution Update from None to Invalid

Comment:

Thank you for your report, but may I ask why you have raised the issue here?  
You are using a (trademark-infringing) x86_64-w64-mingw32 GCC compiler suite, 
(which, in spite of its anomalous mingw32 host identification, appears to 
generate 64-bit code); this compiler suite appears to originate from the MSYS2 
project, which is not, in any way, associated with the MinGW Project, (the 
legitimate owner of the MinGW trademark).  Consequently, I am inclined to close 
this, as an "invalid" ticket.
Before I do close it, however, I will offer the following comments:
If I compile your test case, with my own mingw32-g++ (GNU/Linux hosted 
GCC-9.2.0 cross-compiler), I see somewhat different assembly for your 
crash_vector_ret function:
$ mingw32-g++ -S -O0 -DARRAY_SIZE=32 -o- -masm=intel -mavx2 main.cc | grep -v 
cfi | less -FX
...
        .text
        .globl  __Z16crash_vector_retPh
        .def    __Z16crash_vector_retPh;        .scl    2;      .type   32;     
.endef
__Z16crash_vector_retPh:
LFB5437:
        push    ebp
        mov     ebp, esp
        and     esp, -32
        sub     esp, 64
        mov     eax, DWORD PTR [ebp+8]
        mov     DWORD PTR [esp+28], eax
        mov     eax, DWORD PTR [esp+28]
        vmovdqu xmm0, XMMWORD PTR [eax]
        vinserti128     ymm0, ymm0, XMMWORD PTR [eax+16], 0x1
        vmovdqa YMMWORD PTR [esp+32], ymm0
        vmovdqa ymm0, YMMWORD PTR [esp+32]
        leave
        ret
Ignoring that this is 32-bit assembly, (and if I actually try to run it, it 
crashes, not with the SIGSEGV you report, but with a SIGILL "illegal 
instruction" exception), do note the additional:
        and     esp, -32
instruction, at the beginning of the local stack frame set-up, for which no 
corresponding instruction appears in your GDB disassembly: this ensures that 
the local frame itself, and thus any variable addressed at a 32-byte offset 
within it, will be correctly aligned on a 32-byte boundary.
Alternatively, if I compile with the Linux-native GCC-10.2.0 compiler, I see:
$ g++ -S -O0 -DARRAY_SIZE=32 -o- -masm=intel -mavx2 main.cc | grep -v cfi | 
less -FX
...
        .text
        .size   _ZlsRSoRKDv4_x, .-_ZlsRSoRKDv4_x
        .globl  _Z16crash_vector_retPh
        .type   _Z16crash_vector_retPh, @function
_Z16crash_vector_retPh:
.LFB5667:
        push    rbp
        mov     rbp, rsp
        and     rsp, -32
        mov     QWORD PTR -56[rsp], rdi
        mov     rax, QWORD PTR -56[rsp]
        mov     QWORD PTR -40[rsp], rax
        mov     rax, QWORD PTR -40[rsp]
        vmovdqu xmm0, XMMWORD PTR [rax]
        vinserti128     ymm0, ymm0, XMMWORD PTR 16[rax], 0x1
        vmovdqa YMMWORD PTR -32[rsp], ymm0
        vmovdqa ymm0, YMMWORD PTR -32[rsp]
        leave
        ret
Obviously, this is now 64-bit Linux-native code, but it too has that 
corresponding:
        and     rsp, -32
instruction, to achieve the required stack frame alignment.  (Alas, if I try to 
run this, it too crashes with a SIGILL exception, on the first vmovdqu 
instruction; I guess the Intel Celeron N4000 64-bit processor, in my laptop, 
either doesn't support the AVX2 instructions, or they are disabled in the 
firmware configuration).

---------------------------------------------------------------------
Ticket Status:

      Reporter: michal_fapso
         Owner: (None)
          Type: Issues
        Status: Closed
      Priority: 5 - Medium
     MileStone: (None)
     Component: MSYS
      Severity: 5 - Medium
    Resolution: Invalid
---------------------------------------------------------------------

Ticket details:

Summary of the Issue
Working with AVX2 SIMD vectors. When a __m256i value is returned from a 
function, the SIMD register ymm0 is copied to memory using the aligned move 
instruction vmovdqa, so the destination memory address must be aligned to 32 
bytes. This requirement is not met in my test case and the program crashes with 
SIGSEGV. More info follows in the "Minimal Test Case" section below.
I tested it also on Ubuntu's gcc, but it didn't crash there, neither did with 
clang++ on Windows and Ubuntu, so therefore I am submitting this issue to the 
MinGW tracker. But I might be wrong and the issue may exist in gcc as well.
Host Operating System Information and Version
Windows 10 Home, Version 10.0.17134 Build 17134
with latest updated MSYS2.
GCC Version
$ gcc -v
Using built-in specs.
COLLECT_GCC=D:\msys64_latest\mingw64\bin\gcc.exe
COLLECT_LTO_WRAPPER=D:/msys64_latest/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/9.2.0/lto-wrapper.exe
Target: x86_64-w64-mingw32
Configured with: ../gcc-9.2.0/configure --prefix=/mingw64 
--with-local-prefix=/mingw64/local --build=x86_64-w64-mingw32 
--host=x86_64-w64-mingw32 --target=x86_64-w64-mingw32 
--with-native-system-header-dir=/mingw64/x86_64-w64-mingw32/include 
--libexecdir=/mingw64/lib --enable-bootstrap --with-arch=x86-64 
--with-tune=generic --enable-languages=c,lto,c++,fortran,ada,objc,obj-c++ 
--enable-shared --enable-static --enable-libatomic --enable-threads=posix 
--enable-graphite --enable-fully-dynamic-string 
--enable-libstdcxx-filesystem-ts=yes --enable-libstdcxx-time=yes 
--disable-libstdcxx-pch --disable-libstdcxx-debug --disable-isl-version-check 
--enable-lto --enable-libgomp --disable-multilib --enable-checking=release 
--disable-rpath --disable-win32-registry --disable-nls --disable-werror 
--disable-symvers --enable-plugin --with-libiconv --with-system-zlib 
--with-gmp=/mingw64 --with-mpfr=/mingw64 --with-mpc=/mingw64 
--with-isl=/mingw64 --with-pkgversion='Rev2, Built by MSYS2 project' 
--with-bugurl=https://sourceforge.net/projects/msys2 --with-gnu-as --with-gnu-ld
Thread model: posix
gcc version 9.2.0 (Rev2, Built by MSYS2 project)
Binutils Version
$ ld -v
GNU ld (GNU Binutils) 2.32
MinGW Version
I don't know what should I look for in the mingw/include/_mingw.h file.
Build Environment
$ uname -a
MINGW64_NT-10.0-17134 MISO-PC 3.0.7-338.x86_64 2019-07-11 10:58 UTC x86_64 Msys
Minimal Self-Contained Test Case
main.cpp:
 #include <iostream> #include <immintrin.h>  #define DBG(var_name) 
std::cout<<#var_name": "<<(var_name)<<std::endl  // Output operator for vector 
std::ostream& operator<<(std::ostream& oss, const __m256i& v) {         
constexpr size_t length_bytes = 32;         unsigned char a[length_bytes];      
   _mm256_storeu_si256(reinterpret_cast<__m256i*>(a), v);         oss << "[";   
      std::string sep = "";         for (size_t i=0; i<length_bytes; i++) {     
            oss << sep << int(a[i]);                 sep = " ";         }       
  return oss << "]"; }  __m256i __attribute__ ((noinline)) 
crash_vector_ret(uint8_t* a) {         __m256i v;         v = 
_mm256_loadu_si256(reinterpret_cast<const __m256i*>(a));         // Crash       
  return v; }  int main(int argc, char** argv) {         // Setup memory from 
which the vector will be loaded         const int a_size = ARRAY_SIZE;         
uint8_t a[a_size];         for (volatile int i=0; i<a_size; i++) {              
   a[i] = i;         }         DBG(alignof(__m256i));          __m256i vr;      
   vr = crash_vector_ret(a);         DBG(vr);          return 0; }Makefile:
 CXXFLAGS += -mavx2 CXXFLAGS += -std=c++17 CXXFLAGS += -g CXX = g++  main.exe: 
main.cpp         $(CXX) $(CXXFLAGS) -O1 -DARRAY_SIZE=32 -o $@ $<  test: 
main.cpp         for COMPILER in g++; do \                 ASM_FLAGS="-S 
-fverbose-asm -masm=intel"; \                 for ARRAY_SIZE in 32 36; do \     
                    for OPT in 0 1 2 3; do \                                 
NAME="main_O$${OPT}_$${COMPILER}_s$${ARRAY_SIZE}"; \                            
     echo "--------------------------------------------------"; \               
                  echo "NAME:$${NAME}"; \                                 rm -f 
$${NAME}.exe; \                                 $${COMPILER} $(CXXFLAGS) 
-DARRAY_SIZE=$${ARRAY_SIZE} -O$${OPT}               -o $${NAME}.exe main.cpp; \ 
                                $${COMPILER} $(CXXFLAGS) 
-DARRAY_SIZE=$${ARRAY_SIZE} -O$${OPT} $${ASM_FLAGS} -o $${NAME}.s   main.cpp; \ 
                                ./$${NAME}.exe; \                         done; 
\                 done; \         doneTrying various compilation options:
$ make test
for COMPILER in g++; do \
        ASM_FLAGS="-S -fverbose-asm -masm=intel"; \
        for ARRAY_SIZE in 32 36; do \
                for OPT in 0 1 2 3; do \
                        NAME="main_O${OPT}_${COMPILER}_s${ARRAY_SIZE}"; \
                        echo 
"--------------------------------------------------"; \
                        echo "NAME:${NAME}"; \
                        rm -f ${NAME}.exe; \
                        ${COMPILER} -mavx2 -std=c++17 -g 
-DARRAY_SIZE=${ARRAY_SIZE} -O${OPT}               -o ${NAME}.exe main.cpp; \
                        ${COMPILER} -mavx2 -std=c++17 -g 
-DARRAY_SIZE=${ARRAY_SIZE} -O${OPT} ${ASM_FLAGS} -o ${NAME}.s   main.cpp; \
                        ./${NAME}.exe; \
                done; \
        done; \
done
--------------------------------------------------
NAME:main_O0_g++_s32
alignof(__m256i): 32
/bin/sh: line 3: 15627 Segmentation fault      ./${NAME}.exe
--------------------------------------------------
NAME:main_O1_g++_s32
alignof(__m256i): 32
/bin/sh: line 3: 15631 Segmentation fault      ./${NAME}.exe
--------------------------------------------------
NAME:main_O2_g++_s32
alignof(__m256i): 32
/bin/sh: line 3: 15635 Segmentation fault      ./${NAME}.exe
--------------------------------------------------
NAME:main_O3_g++_s32
alignof(__m256i): 32
/bin/sh: line 3: 15639 Segmentation fault      ./${NAME}.exe
--------------------------------------------------
NAME:main_O0_g++_s36
alignof(__m256i): 32
/bin/sh: line 3: 15643 Segmentation fault      ./${NAME}.exe
--------------------------------------------------
NAME:main_O1_g++_s36
alignof(__m256i): 32
vr: [0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 
28 29 30 31]
--------------------------------------------------
NAME:main_O2_g++_s36
alignof(__m256i): 32
vr: [0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 
28 29 30 31]
--------------------------------------------------
NAME:main_O3_g++_s36
alignof(__m256i): 32
vr: [0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 
28 29 30 31]
GDB session
Program compiled by
g++ -mavx2 -std=c++17 -g -DARRAY_SIZE=32 -O0 -o main_O0_g++_s32.exe main.cpp
$ gdb main_O0_g++_s32
GNU gdb (GDB) 8.3
Copyright (C) 2019 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-w64-mingw32".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from main_O0_g++_s32...
(gdb) r
Starting program: D:\SourceCode\cpp_simdpp_crash_loadu\main_O0_g++_s32.exe
[New Thread 10112.0x4c8]
[New Thread 10112.0x409c]
[New Thread 10112.0x6bf0]
alignof(__m256i): 32

Thread 1 received signal SIGSEGV, Segmentation fault.
0x00000000004016ea in crash_vector_ret (a=0x66fe10 "") at main.cpp:24
24              v = _mm256_loadu_si256(reinterpret_cast<const __m256i*>(a));
(gdb) bt
#0  0x00000000004016ea in crash_vector_ret (a=0x66fe10 "") at main.cpp:24
#1  0x000000000040179f in main (argc=1, argv=0xe64a60) at main.cpp:40
(gdb) set disassembly-flavor intel
(gdb) disas
Dump of assembler code for function crash_vector_ret(unsigned char*):
   0x00000000004016b7 <+0>:     push   rbp
   0x00000000004016b8 <+1>:     mov    rbp,rsp
   0x00000000004016bb <+4>:     sub    rsp,0x10
   0x00000000004016bf <+8>:     mov    QWORD PTR [rbp+0x10],rcx
   0x00000000004016c3 <+12>:    mov    QWORD PTR [rbp+0x18],rdx
   0x00000000004016c7 <+16>:    mov    rax,QWORD PTR [rbp+0x18]
   0x00000000004016cb <+20>:    mov    QWORD PTR [rbp-0x8],rax
   0x00000000004016cf <+24>:    mov    rax,QWORD PTR [rbp-0x8]
   0x00000000004016d3 <+28>:    vmovdqu xmm0,XMMWORD PTR [rax]
   0x00000000004016d7 <+32>:    vinserti128 ymm0,ymm0,XMMWORD PTR [rax+0x10],0x1
   0x00000000004016de <+39>:    vmovdqa ymm1,ymm0
   0x00000000004016e2 <+43>:    vmovdqa ymm0,ymm1
   0x00000000004016e6 <+47>:    mov    rax,QWORD PTR [rbp+0x10]
=> 0x00000000004016ea <+51>:    vmovdqa YMMWORD PTR [rax],ymm0
   0x00000000004016ee <+55>:    nop
   0x00000000004016ef <+56>:    mov    rax,QWORD PTR [rbp+0x10]
   0x00000000004016f3 <+60>:    add    rsp,0x10
   0x00000000004016f7 <+64>:    pop    rbp
   0x00000000004016f8 <+65>:    ret
End of assembler dump.
(gdb) p $rax
$1 = 6749616
(gdb) p $rax/32*32
$2 = 6749600
As you can see, the $rax address is not aligned to 32 bytes and therefore the 
vmovdqa crashes.
I thought that because of the alignof(__m256i) == 32, the compiler should 
guarantee that also the return value is aligned to 32 bytes.
I might be wrong. Could you please shed more light on this issue?
Thanks for any hints.

-- 
Ticket information of MinGW - Minimalist GNU for Windows project
MinGW - Minimalist GNU for Windows Project is hosted on OSDN

Project URL: https://osdn.net/projects/mingw/
OSDN: https://osdn.net

URL for this Ticket:
    https://osdn.net/projects/mingw/ticket/39565
RSS feed for this Ticket:
    https://osdn.net/ticket/ticket_rss.php?group_id=3917&tid=39565


------------------------------

Message: 2
Date: Wed, 14 Apr 2021 03:53:49 +0900
From: MinGW Notification List <[email protected]>
To: OSDN Ticket System <[email protected]>
Subject: [MinGW-Notify] [mingw] #39565: Incorrect alignment of SIMD
        vectors returned by value from functions
Message-ID: <[email protected]>
Content-Type: text/plain; charset=UTF-8

#39565: Incorrect alignment of SIMD vectors returned by value from functions

  Open Date: 2019-09-12 15:47
Last Update: 2021-04-14 03:53

URL for this Ticket:
    https://osdn.net//projects/mingw/ticket/39565
RSS feed for this Ticket:
    https://osdn.net/ticket/ticket_rss.php?group_id=3917&tid=39565

---------------------------------------------------------------------

Last Changes/Comment on this Ticket:
2021-04-14 03:53 Updated by: michal_fapso

Comment:

Thanks a lot for your answer, @keith, and for testing it on linux in the 
cross-compile mode. I wasn't aware of any trademark issues between MinGW and 
MSYS2. I'm sorry to hear that. I thought that msys2 includes the mingw gcc 
compiler package just like it includes other open source packages. I know that 
they apply some patches on top of MinGW, but I thought that this kind of bug is 
so low level that it has to be directly in MinGW sources, not in msys2 patches. 
Eventually, it seems this bug is msys2 specific. Anyway, I like your idea of 
cross-compiling on linux for windows. Thanks, you can close this issue.

---------------------------------------------------------------------
Ticket Status:

      Reporter: michal_fapso
         Owner: (None)
          Type: Issues
        Status: Closed
      Priority: 5 - Medium
     MileStone: (None)
     Component: MSYS
      Severity: 5 - Medium
    Resolution: Invalid
---------------------------------------------------------------------

Ticket details:

Summary of the Issue
Working with AVX2 SIMD vectors. When a __m256i value is returned from a 
function, the SIMD register ymm0 is copied to memory using the aligned move 
instruction vmovdqa, so the destination memory address must be aligned to 32 
bytes. This requirement is not met in my test case and the program crashes with 
SIGSEGV. More info follows in the "Minimal Test Case" section below.
I tested it also on Ubuntu's gcc, but it didn't crash there, neither did with 
clang++ on Windows and Ubuntu, so therefore I am submitting this issue to the 
MinGW tracker. But I might be wrong and the issue may exist in gcc as well.
Host Operating System Information and Version
Windows 10 Home, Version 10.0.17134 Build 17134
with latest updated MSYS2.
GCC Version
$ gcc -v
Using built-in specs.
COLLECT_GCC=D:\msys64_latest\mingw64\bin\gcc.exe
COLLECT_LTO_WRAPPER=D:/msys64_latest/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/9.2.0/lto-wrapper.exe
Target: x86_64-w64-mingw32
Configured with: ../gcc-9.2.0/configure --prefix=/mingw64 
--with-local-prefix=/mingw64/local --build=x86_64-w64-mingw32 
--host=x86_64-w64-mingw32 --target=x86_64-w64-mingw32 
--with-native-system-header-dir=/mingw64/x86_64-w64-mingw32/include 
--libexecdir=/mingw64/lib --enable-bootstrap --with-arch=x86-64 
--with-tune=generic --enable-languages=c,lto,c++,fortran,ada,objc,obj-c++ 
--enable-shared --enable-static --enable-libatomic --enable-threads=posix 
--enable-graphite --enable-fully-dynamic-string 
--enable-libstdcxx-filesystem-ts=yes --enable-libstdcxx-time=yes 
--disable-libstdcxx-pch --disable-libstdcxx-debug --disable-isl-version-check 
--enable-lto --enable-libgomp --disable-multilib --enable-checking=release 
--disable-rpath --disable-win32-registry --disable-nls --disable-werror 
--disable-symvers --enable-plugin --with-libiconv --with-system-zlib 
--with-gmp=/mingw64 --with-mpfr=/mingw64 --with-mpc=/mingw64 
--with-isl=/mingw64 --with-pkgversion='Rev2, Built by MSYS2 project' 
--with-bugurl=https://sourceforge.net/projects/msys2 --with-gnu-as --with-gnu-ld
Thread model: posix
gcc version 9.2.0 (Rev2, Built by MSYS2 project)
Binutils Version
$ ld -v
GNU ld (GNU Binutils) 2.32
MinGW Version
I don't know what should I look for in the mingw/include/_mingw.h file.
Build Environment
$ uname -a
MINGW64_NT-10.0-17134 MISO-PC 3.0.7-338.x86_64 2019-07-11 10:58 UTC x86_64 Msys
Minimal Self-Contained Test Case
main.cpp:
 #include <iostream> #include <immintrin.h>  #define DBG(var_name) 
std::cout<<#var_name": "<<(var_name)<<std::endl  // Output operator for vector 
std::ostream& operator<<(std::ostream& oss, const __m256i& v) {         
constexpr size_t length_bytes = 32;         unsigned char a[length_bytes];      
   _mm256_storeu_si256(reinterpret_cast<__m256i*>(a), v);         oss << "[";   
      std::string sep = "";         for (size_t i=0; i<length_bytes; i++) {     
            oss << sep << int(a[i]);                 sep = " ";         }       
  return oss << "]"; }  __m256i __attribute__ ((noinline)) 
crash_vector_ret(uint8_t* a) {         __m256i v;         v = 
_mm256_loadu_si256(reinterpret_cast<const __m256i*>(a));         // Crash       
  return v; }  int main(int argc, char** argv) {         // Setup memory from 
which the vector will be loaded         const int a_size = ARRAY_SIZE;         
uint8_t a[a_size];         for (volatile int i=0; i<a_size; i++) {              
   a[i] = i;         }         DBG(alignof(__m256i));          __m256i vr;      
   vr = crash_vector_ret(a);         DBG(vr);          return 0; }Makefile:
 CXXFLAGS += -mavx2 CXXFLAGS += -std=c++17 CXXFLAGS += -g CXX = g++  main.exe: 
main.cpp         $(CXX) $(CXXFLAGS) -O1 -DARRAY_SIZE=32 -o $@ $<  test: 
main.cpp         for COMPILER in g++; do \                 ASM_FLAGS="-S 
-fverbose-asm -masm=intel"; \                 for ARRAY_SIZE in 32 36; do \     
                    for OPT in 0 1 2 3; do \                                 
NAME="main_O$${OPT}_$${COMPILER}_s$${ARRAY_SIZE}"; \                            
     echo "--------------------------------------------------"; \               
                  echo "NAME:$${NAME}"; \                                 rm -f 
$${NAME}.exe; \                                 $${COMPILER} $(CXXFLAGS) 
-DARRAY_SIZE=$${ARRAY_SIZE} -O$${OPT}               -o $${NAME}.exe main.cpp; \ 
                                $${COMPILER} $(CXXFLAGS) 
-DARRAY_SIZE=$${ARRAY_SIZE} -O$${OPT} $${ASM_FLAGS} -o $${NAME}.s   main.cpp; \ 
                                ./$${NAME}.exe; \                         done; 
\                 done; \         doneTrying various compilation options:
$ make test
for COMPILER in g++; do \
        ASM_FLAGS="-S -fverbose-asm -masm=intel"; \
        for ARRAY_SIZE in 32 36; do \
                for OPT in 0 1 2 3; do \
                        NAME="main_O${OPT}_${COMPILER}_s${ARRAY_SIZE}"; \
                        echo 
"--------------------------------------------------"; \
                        echo "NAME:${NAME}"; \
                        rm -f ${NAME}.exe; \
                        ${COMPILER} -mavx2 -std=c++17 -g 
-DARRAY_SIZE=${ARRAY_SIZE} -O${OPT}               -o ${NAME}.exe main.cpp; \
                        ${COMPILER} -mavx2 -std=c++17 -g 
-DARRAY_SIZE=${ARRAY_SIZE} -O${OPT} ${ASM_FLAGS} -o ${NAME}.s   main.cpp; \
                        ./${NAME}.exe; \
                done; \
        done; \
done
--------------------------------------------------
NAME:main_O0_g++_s32
alignof(__m256i): 32
/bin/sh: line 3: 15627 Segmentation fault      ./${NAME}.exe
--------------------------------------------------
NAME:main_O1_g++_s32
alignof(__m256i): 32
/bin/sh: line 3: 15631 Segmentation fault      ./${NAME}.exe
--------------------------------------------------
NAME:main_O2_g++_s32
alignof(__m256i): 32
/bin/sh: line 3: 15635 Segmentation fault      ./${NAME}.exe
--------------------------------------------------
NAME:main_O3_g++_s32
alignof(__m256i): 32
/bin/sh: line 3: 15639 Segmentation fault      ./${NAME}.exe
--------------------------------------------------
NAME:main_O0_g++_s36
alignof(__m256i): 32
/bin/sh: line 3: 15643 Segmentation fault      ./${NAME}.exe
--------------------------------------------------
NAME:main_O1_g++_s36
alignof(__m256i): 32
vr: [0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 
28 29 30 31]
--------------------------------------------------
NAME:main_O2_g++_s36
alignof(__m256i): 32
vr: [0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 
28 29 30 31]
--------------------------------------------------
NAME:main_O3_g++_s36
alignof(__m256i): 32
vr: [0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 
28 29 30 31]
GDB session
Program compiled by
g++ -mavx2 -std=c++17 -g -DARRAY_SIZE=32 -O0 -o main_O0_g++_s32.exe main.cpp
$ gdb main_O0_g++_s32
GNU gdb (GDB) 8.3
Copyright (C) 2019 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-w64-mingw32".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from main_O0_g++_s32...
(gdb) r
Starting program: D:\SourceCode\cpp_simdpp_crash_loadu\main_O0_g++_s32.exe
[New Thread 10112.0x4c8]
[New Thread 10112.0x409c]
[New Thread 10112.0x6bf0]
alignof(__m256i): 32

Thread 1 received signal SIGSEGV, Segmentation fault.
0x00000000004016ea in crash_vector_ret (a=0x66fe10 "") at main.cpp:24
24              v = _mm256_loadu_si256(reinterpret_cast<const __m256i*>(a));
(gdb) bt
#0  0x00000000004016ea in crash_vector_ret (a=0x66fe10 "") at main.cpp:24
#1  0x000000000040179f in main (argc=1, argv=0xe64a60) at main.cpp:40
(gdb) set disassembly-flavor intel
(gdb) disas
Dump of assembler code for function crash_vector_ret(unsigned char*):
   0x00000000004016b7 <+0>:     push   rbp
   0x00000000004016b8 <+1>:     mov    rbp,rsp
   0x00000000004016bb <+4>:     sub    rsp,0x10
   0x00000000004016bf <+8>:     mov    QWORD PTR [rbp+0x10],rcx
   0x00000000004016c3 <+12>:    mov    QWORD PTR [rbp+0x18],rdx
   0x00000000004016c7 <+16>:    mov    rax,QWORD PTR [rbp+0x18]
   0x00000000004016cb <+20>:    mov    QWORD PTR [rbp-0x8],rax
   0x00000000004016cf <+24>:    mov    rax,QWORD PTR [rbp-0x8]
   0x00000000004016d3 <+28>:    vmovdqu xmm0,XMMWORD PTR [rax]
   0x00000000004016d7 <+32>:    vinserti128 ymm0,ymm0,XMMWORD PTR [rax+0x10],0x1
   0x00000000004016de <+39>:    vmovdqa ymm1,ymm0
   0x00000000004016e2 <+43>:    vmovdqa ymm0,ymm1
   0x00000000004016e6 <+47>:    mov    rax,QWORD PTR [rbp+0x10]
=> 0x00000000004016ea <+51>:    vmovdqa YMMWORD PTR [rax],ymm0
   0x00000000004016ee <+55>:    nop
   0x00000000004016ef <+56>:    mov    rax,QWORD PTR [rbp+0x10]
   0x00000000004016f3 <+60>:    add    rsp,0x10
   0x00000000004016f7 <+64>:    pop    rbp
   0x00000000004016f8 <+65>:    ret
End of assembler dump.
(gdb) p $rax
$1 = 6749616
(gdb) p $rax/32*32
$2 = 6749600
As you can see, the $rax address is not aligned to 32 bytes and therefore the 
vmovdqa crashes.
I thought that because of the alignof(__m256i) == 32, the compiler should 
guarantee that also the return value is aligned to 32 bytes.
I might be wrong. Could you please shed more light on this issue?
Thanks for any hints.

-- 
Ticket information of MinGW - Minimalist GNU for Windows project
MinGW - Minimalist GNU for Windows Project is hosted on OSDN

Project URL: https://osdn.net/projects/mingw/
OSDN: https://osdn.net

URL for this Ticket:
    https://osdn.net/projects/mingw/ticket/39565
RSS feed for this Ticket:
    https://osdn.net/ticket/ticket_rss.php?group_id=3917&tid=39565


------------------------------

Subject: Digest Footer

_______________________________________________
MinGW-Notify mailing list
[email protected]
https://lists.osdn.me/mailman/listinfo/mingw-notify


------------------------------

End of MinGW-Notify Digest, Vol 43, Issue 5
*******************************************

Reply via email to