https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81201

            Bug ID: 81201
           Summary: The final asm code doesn't check if a function changes
                    the value of ebx, resulting in segmentation fault.
           Product: gcc
           Version: 6.3.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: arget at autistici dot org
  Target Milestone: ---

Created attachment 41628
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41628&action=edit
GCC preprocessed code of the source code described above on raspbian.

Hi, compiling with gcc version 6.3.0:
arget@plata:~$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/6/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 6.3.0-18'
--with-bugurl=file:///usr/share/doc/gcc-6/README.Bugs
--enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr
--program-suffix=-6 --program-prefix=x86_64-linux-gnu- --enable-shared
--enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext
--enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/
--enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes
--with-default-libstdcxx-abi=new --enable-gnu-unique-object
--disable-vtable-verify --enable-libmpx --enable-plugin --enable-default-pie
--with-system-zlib --disable-browser-plugin --enable-java-awt=gtk
--enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-6-amd64/jre
--enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-6-amd64
--with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-6-amd64
--with-arch-directory=amd64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar
--with-target-system-zlib --enable-objc-gc=auto --enable-multiarch
--with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32
--enable-multilib --with-tune=generic --enable-checking=release
--build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 6.3.0 20170516 (Debian 6.3.0-18) 
arget@plata:~$ uname -a
Linux plata 4.9.0-3-amd64 #1 SMP Debian 4.9.30-2+deb9u1 (2017-06-18) x86_64
GNU/Linux

There is problem with the following code. It's just my own implementation of
the ChaCha20 stream cipher like CSPRNG (Cryptographically Secure Pseudo-Random
Number Generator, something like random() but safer):

#include <stdio.h>
#include <stdint.h>
#include <string.h>

#define NONCE "\x00\x00\x00\x09\x00\x00\x00\x4a\x00\x00\x00\x00"

#define RtoL(x, n) \
        ((x << n) | (x >> (32 - n)))

#define QR(a, b, c, d) \
    estado[a] += estado[b]; estado[d] ^= estado[a]; estado[d] = RtoL(estado[d],
16); \
    estado[c] += estado[d]; estado[b] ^= estado[c]; estado[b] = RtoL(estado[b],
12); \
    estado[a] += estado[b]; estado[d] ^= estado[a]; estado[d] = RtoL(estado[d],
8); \
    estado[c] += estado[d]; estado[b] ^= estado[c]; estado[b] = RtoL(estado[b],
7);

// "expand 32-byte k"
static const uint32_t chachaConst[4] = {0x61707865,
                                        0x3320646e,
                                        0x79622d32,
                                        0x6b206574};
static uint32_t chachaKey[8],
                chachaCount = 0,
                chachaNonce[3];
static uint8_t quedanPorLeer = 0,
               chachaRandomOutput[64];

static void chacha()
{
    uint32_t estado[16];
    uint32_t i;
    memcpy(estado, chachaConst, 16);
    memcpy(&estado[4], chachaKey, 64);
    chachaCount++;
    estado[12] = chachaCount;
    memcpy(&estado[13], chachaNonce, 12);
    memcpy(chachaRandomOutput, estado, 64);

    for(i = 0; i < 10; i++)
    {
        QR(0, 4,  8, 12)
        QR(1, 5,  9, 13)
        QR(2, 6, 10, 14)
        QR(3, 7, 11, 15)
        QR(0, 5, 10, 15)
        QR(1, 6, 11, 12)
        QR(2, 7,  8, 13)
        QR(3, 4,  9, 14)
    }

    uint32_t *q = (uint32_t*)chachaRandomOutput;
    for(i = 0; i < 64; i++)
        q[i] += estado[i];
}

void chachaSeed(const uint8_t s[32])
{
    memcpy(chachaKey, s, 32);
    memcpy(chachaNonce, NONCE, 12);
    chachaCount = 0;
    quedanPorLeer = 0;
}

uint8_t chachaGet()
{
    if(!quedanPorLeer)
    {
        chacha();
        quedanPorLeer = 64;
    }
    return chachaRandomOutput[64 - (quedanPorLeer--)];
}

int main()
{
    chachaSeed((const
uint8_t*)"\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f");
    int i;
    for(i = 0; i < 64; i++)
        printf("%c", chachaGet());
}

The ChaCha20-CSPRNG works similar to random() and srandom(), with chachaSeed()
you can set a 32 byte seed and chachaGet() will return the random values in
several calls to this function.
The values of the nonce and seed are the inputs used in the test vector of the
RFC-7539 (https://tools.ietf.org/html/rfc7539#section-2.3.2).
The program works great compiled for 64 bits:

arget@plata:~$ gcc a.c -o a
arget@plata:~$ file a
a: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked,
interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32,
BuildID[sha1]=a11bd03a3633d945c5ef7f20baa85b313f65dac6, not stripped
arget@plata:~$ ./a | xxd
00000000: 10f1 e7e4 d13b 5915 500f dd1f a320 71c4  .....;Y.P.... q.
00000010: c7d1 f4c7 33c0 6803 0422 aa9a c3d4 6c4e  ....3.h.."....lN
00000020: d282 6446 079f aa09 14c2 d705 d98b 02a2  ..dF............
00000030: b512 9cd1 de16 4eb9 cbd0 83e8 a250 3c4e  ......N......P<N

But compiled for 32 bits it breaks in sigsegv:
arget@plata:~$ gcc a.c -o a -m32
arget@plata:~$ ./a
ViolaciĆ³n de segmento

[Sorry for my system in spanish]
I have tried the -Wall -Wextra and -fno-strict-aliasing -fwrapv, but gcc
doesn't find nothing wrong in my code, and the problem persists...
With gcc I found the program breaks in the function chachaGet():
00000b8f <chachaGet>:
 b8f:   55                      push   ebp
 b90:   89 e5                   mov    ebp,esp
 b92:   53                      push   ebx
 b93:   83 ec 04                sub    esp,0x4
 b96:   e8 05 f9 ff ff          call   4a0 <__x86.get_pc_thunk.bx>
 b9b:   81 c3 65 14 00 00       add    ebx,0x1465
 ba1:   0f b6 83 28 00 00 00    movzx  eax,BYTE PTR [ebx+0x28]
 ba8:   84 c0                   test   al,al
 baa:   75 0c                   jne    bb8 <chachaGet+0x29>
 bac:   c6 83 28 00 00 00 40    mov    BYTE PTR [ebx+0x28],0x40
 bb3:   e8 18 fa ff ff          call   5d0 <chacha>
 bb8:   0f b6 83 28 00 00 00    movzx  eax,BYTE PTR [ebx+0x28]
 bbf:   8d 50 ff                lea    edx,[eax-0x1]
 bc2:   88 93 28 00 00 00       mov    BYTE PTR [ebx+0x28],dl
 bc8:   0f b6 c0                movzx  eax,al
 bcb:   ba 40 00 00 00          mov    edx,0x40
 bd0:   29 c2                   sub    edx,eax
 bd2:   8d 83 80 00 00 00       lea    eax,[ebx+0x80]
 bd8:   0f b6 04 10             movzx  eax,BYTE PTR [eax+edx*1]
 bdc:   83 c4 04                add    esp,0x4
 bdf:   5b                      pop    ebx
 be0:   5d                      pop    ebp
 be1:   c3                      ret

The program breaks in 0xbb8, "movzx  eax,BYTE PTR [ebx+0x28]". In this
instruction the code "acts" as if the ebx register wouldn't have changed since
it executes __x86.get_pc_thunk.bx, but the call to chacha has changed it,
resulting in ebx = 0.
I fixed it saving the ebx in the stack before the call to chacha and restoring
it to ebx from the stack after chacha:
uint8_t chachaGet()
{
    if(!quedanPorLeer)
    {
        __asm__("push %ebp");
        chacha();
        __asm__("pop  %ebp");
        quedanPorLeer = 64;
    }
    return chachaRandomOutput[64 - (quedanPorLeer--)];
}

Now the code works:
arget@plata:~$ gcc a.c -o a -m32
arget@plata:~$ ./a | xxd
00000000: 10f1 e7e4 d13b 5915 500f dd1f a320 71c4  .....;Y.P.... q.
00000010: c7d1 f4c7 33c0 6803 0422 aa9a c3d4 6c4e  ....3.h.."....lN
00000020: d282 6446 079f aa09 14c2 d705 d98b 02a2  ..dF............
00000030: b512 9cd1 de16 4eb9 cbd0 83e8 a250 3c4e  ......N......P<N
arget@plata:~$ 

But... as you can understand, this isn't a real solution, especially because it
isn't portable.
The problem also occurs compiling for arm on a raspbian:

pi@raspberrypi:~ $ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/arm-linux-gnueabihf/4.9/lto-wrapper
Target: arm-linux-gnueabihf
Configured with: ../src/configure -v --with-pkgversion='Raspbian 4.9.2-10'
--with-bugurl=file:///usr/share/doc/gcc-4.9/README.Bugs
--enable-languages=c,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr
--program-suffix=-4.9 --enable-shared --enable-linker-build-id
--libexecdir=/usr/lib --without-included-gettext --enable-threads=posix
--with-gxx-include-dir=/usr/include/c++/4.9 --libdir=/usr/lib --enable-nls
--with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug
--enable-libstdcxx-time=yes --enable-gnu-unique-object --disable-libitm
--disable-libquadmath --enable-plugin --with-system-zlib
--disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo
--with-java-home=/usr/lib/jvm/java-1.5.0-gcj-4.9-armhf/jre --enable-java-home
--with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-4.9-armhf
--with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-4.9-armhf
--with-arch-directory=arm --with-ecj-jar=/usr/share/java/eclipse-ecj.jar
--enable-objc-gc --enable-multiarch --disable-sjlj-exceptions --with-arch=armv6
--with-fpu=vfp --with-float=hard --enable-checking=release
--build=arm-linux-gnueabihf --host=arm-linux-gnueabihf
--target=arm-linux-gnueabihf
Thread model: posix
gcc version 4.9.2 (Raspbian 4.9.2-10) 
pi@raspberrypi:~ $ uname -a
Linux raspberrypi 4.9.28+ #998 Mon May 15 16:50:35 BST 2017 armv6l GNU/Linux
pi@raspberrypi:~ $ gcc a.c -o a
pi@raspberrypi:~ $ ./a
Segmentation fault
pi@raspberrypi:~ $ 
Since I don't know ARM assembly enough I can't do same analysis as for x86.
In x64 there isn't any error because the position of "quedanPorLeer" is
determined as an offset of the rip register.

As you ask, I send you the *.i file of the compiling for arm on rasbian.

Reply via email to