Hi!

On 18.04.2023 20:14, Maxim Dounin wrote:
Hello!

On Tue, Apr 18, 2023 at 10:50:01AM +0100, Vadim Fedorenko wrote:

On 18.04.2023 02:54, Maxim Dounin wrote:
Hello!

On Tue, Apr 18, 2023 at 02:07:06AM +0300, Vadim Fedorenko via nginx-devel wrote:

GCC version 11 and newer use more aggressive way to eliminate dead stores
which ends up removing ngx_memzero() calls in several places. Such optimization
affects calculations of md5 and sha1 implemented internally in nginx. The
effect could be easily observed by adding a random data to buffer array in
md5_init() or sha1_init() functions. With this simple modifications the result
of the hash computation will be different each time even though the provided
data to hash is not changed.

If calculations of md5 and sha1 are affected, this means that the
stores in question are not dead, and they shouldn't be eliminated
in the first place.  From your description this looks like a bug
in the compiler in question.

Yeah, these ngx_memzero()s must not be dead, but according to the standart they
are. In md5_final() the function is called this way:
ngx_memzero(&ctx->buffer[used], free - 8);
That means that a new variable of type 'char *' is created with the life time
scoped to the call to ngx_memzero(). As the result of of the function is ignored
explicitly, no other parameters are passed by pointer, and the variable is not
accessed anywhere else, the whole call can be optimized out.

The pointer is passed to the function, and the function modifies
the memory being pointed to by the pointer.  While the pointer is
not used anywhere else and can be optimized out, the memory it
points to is used elsewhere, and this modification cannot be
optimized out, so it is incorrect to remove the call according to
my understanding of the C standard.

If you still think it's correct and based on the C standard,
please provide relevant references (and quotes) which explain why
these calls can be optimized out.

Alternatively, this can be a bug in nginx code which makes the
compiler think that it can eliminate these ngx_memzero() calls - for
example, GCC is known to do such things if it sees an undefined
behaviour in the code.

There is no undefined behavior unfortunately, everything in this place is well
defined.

Well, I don't think so.  There is a function call, and it cannot
be eliminated by the compiler unless the compiler thinks that the
results of the function call do not affect the program execution
as externally observed.  Clearly the program execution is affected
(as per your claim).  This leaves us the two possible
alternatives:

- There is a bug in the compiler, and it incorrectly thinks that
   the function call do not affect the program execution.

- There is a bug in the code, and it triggers undefined behaviour,
   so the compiler might not actually know what happens in the code
   (because it not required to do anything meaningful in case of
   undefined behaviour, and simply assume it should never happen).

Just in case, the actual undefined behaviour might occur in the
ngx_md5_body() function due to strict-aliasing rules being broken
by the optimized GET() macro on platforms without strict alignment
requirements if the original data buffer as provided to
ngx_md5_update() cannot be aliased by uint32_t.  See this commit in
the original repository of the md5 code nginx uses:

https://cvsweb.openwall.com/cgi/cvsweb.cgi/Owl/packages/popa3d/popa3d/md5/md5.c.diff?r1=1.14;r2=1.15

But nginx only uses ngx_md5_update() with text buffers, so
strict-aliasing rules aren't broken.

You may want to elaborate more on how to reproduce this, and, if
possible, how to build a minimal test case which demonstrates the
problem.

Sure, let's elaborate a bit. To reproduce the bug you can simply apply the diff:

diff --git a/src/core/ngx_md5.c b/src/core/ngx_md5.c
index c25d0025d..67cc06438 100644
--- a/src/core/ngx_md5.c
+++ b/src/core/ngx_md5.c
@@ -24,6 +24,7 @@ ngx_md5_init(ngx_md5_t *ctx)
       ctx->d = 0x10325476;

       ctx->bytes = 0;
+    getrandom(ctx->buffer, 64, 0);
   }


Note that this won't compile, it also needs "#include <sys/random.h>".

This code will emulate the garbage for the stack-allocated 'ngx_md5_t md5;' in
ngx_http_file_cache_create_key when nginx is running under the load. Then you
can use simple configuration:

upstream test_001_origin {
    server 127.0.0.1:8000;
}

proxy_cache_path /var/cache/nginx/test-001 keys_zone=test_001:10m max_size=5g
inactive=24h levels=1:2 use_temp_path=off;

server {
    listen 127.0.0.1:8000;

    location = /plain {
      return 200;
    }

}

server {
    listen 127.0.0.1:80;

    location /oplain {
       proxy_cache test_001;
       proxy_cache_key /oplain;
       proxy_pass http://test_001_origin/plain/;
    }
}


Every time you call 'curl http://127.0.0.1/oplain'  a new cache file will be
created, but the md5sum of the file will be the same, meaining that the key
stored in the file is absolutely the same.

Note that the exact configuration will make "GET /plain/" requests
to upstream server, resulting in 404 and nothing cached.  I've
fixed this to actually match "location = /plain" and added
"proxy_cache_valid 200 1h;" to ensure caching will actually work.

Still, I wasn't able to reproduce the issue you are seeing on
FreeBSD 12.4 with gcc12, neither with default compilation flags as
used by nginx, nor with --with-cc-opt="-flto -O2" and
--with-ld-opt="-flto -O2".

On RHEL 9 (Red Hat Enterprise Linux release 9.1 (Plow) from
redhat/ubi9 image in Docker) with gcc11 (gcc version 11.3.1
20220421 (Red Hat 11.3.1-2) (GCC), as installed with "yum install
gcc") I wasn't able to reproduce this as well (also tested both
with default compilation flags as provided by nginx, and cc/ld
options "-flto -O2").

You may want to provide more details on how to reproduce this.
Some exact steps you've actually tested might the way to go.

Changing the code to use current implementation
of ngx_explicit_memzero() doesn't help because of link-time optimizations
enabled in RHEL 9 and derivatives. Glibc 2.34 found in RHEL 9 provides
explicit_bzero() function which should be used to avoid such optimization.
ngx_explicit_memzero() is changed to use explicit_bzero() if possible.

The ngx_explicit_memzero() function is to be used when zeroed data
are indeed not used afterwards, for example, to make sure
passwords are actually eliminated from memory.  It shouldn't be
used instead of a real ngx_memzero() call - doing so might hide
the problem, which is either in the compiler or in nginx, but
won't fix it.

In this case the nginx code should be fixed to avoid partial memory fillings,
but such change will come with performance penalty, especially on the CPUs
without proper `REP MOVSB/MOVSD/MOVSQ` implementation. Controlled usage of
explicit zeroing is much better is this case.

You may want to elaborate on what "nginx code should be fixed to
avoid partial memory fillings" means and why it should be
fixed/avoided.

As for using explicit_bzero() for it, we've looked into various
OS-specific solutions, though there are too many variants out
there, so it was decided that having our own simple implementation
is a better way to go.  If it doesn't work in the particular
setup, it might make sense to adjust our implementation - but
given the above, it might be the same issue which causes the
original problem.

Unfortunately, the memory barrier trick is not working anymore for linker-time
optimizations. Linker has better information about whether the stored
information is used again or not. And it will remove memset in such
implementation, and it will definitely affected security-related code you
mentioned above.

Without link-time optimization, just a separate compilation unit
with a function is more than enough.

The ngx_memory_barrier() is additionally used as in many practical
cases it introduces a compiler barrier, and this also defeats
link-time optimization.  This might not be the case for GCC
though, as with GCC we currently use __sync_synchronize() for
ngx_memory_barrier().  Adding an explicit compiler barrier (asm
with the "memory" clobber should work for most compilers, but not
all) might be the way to go if it's indeed the case.

It does seem to work with GCC with link-time optimizations enabled
though, as least in the RHEL 9 build with gcc11 "-flto -O2".  I'm
seeing this in the disassemble of ngx_http_auth_basic_handler():

    0x00000000004709c6 <+918>:    rep stos %rax,%es:(%rdi)
    0x00000000004709c9 <+921>:    lock orq $0x0,(%rsp)

So it looks like ngx_explicit_memzero() is inlined and optimized
to use "rep stos" instead of memset() call, but not eliminated.

explicit_bzero() function is available in well-loved *BSD
systems now and is a proper way to do cleaning of the artifacts, doesn't matter
which implementation is used in the specific system.

If I recall correctly, when I last checked there were something
like 5 different interfaces out there, including explicit_bzero(),
explicit_memset(), memset_s(), and SecureZeroMemory().  With
memset_s() being required by C11 standard, but with absolutely
brain-damaged interface.  (It looks like now there is also
memset_explicit(), which is going to become a standard in C23.)

As such, the decision was to use our own function which does the
trick in most practical cases.  And if for some reason it doesn't,
this isn't a big issue: that's a mitigation technique at most.


Looks like I found the root cause of the issue in the code added on top of nginx
implementation of md5, which is using “type-punning” for optimization reasons,
but ended with UB when NGX_HAVE_LITTLE_ENDIAN && NGX_HAVE_NONALIGNED are
defined. Sorry for the noise and many thanks to Alejandro for help, hint with
-fstrict-aliasing reminded me this very old change in the code.

All best,
Vadim
_______________________________________________
nginx-devel mailing list
nginx-devel@nginx.org
https://mailman.nginx.org/mailman/listinfo/nginx-devel

Reply via email to