> If I understand correctly, then the simplest correct fix for this problem is
> to modify ld.so such that it preserves all caller-saved registers when
> calling out to functions like realloc(3). In my opinion, ld.so probably
> shouldn't be using the normal malloc at all (instead use a directly embedded
> minimal malloc implementation), because there are lots of mind-boggling ways
> bootstrapping can fail, but that's a more involved change.
I think you cannot count on any fixes on ld.so side. Have you tried running
glibc test with jemalloc?
>
> Can you disable the gcc optimization (-fno-tree-vectorize) when building
> jemalloc? You won't hit actual floating point code in jemalloc unless you
> enable heap profiling, so that should prevent XMM usage in all the relevant
> jemalloc code. If that works okay, I'll need to get a better understanding
> of when to automatically configure the gcc flags that way when building
> jemalloc.
I know that --disable-stats at least allows my computation workflows to run
fine, yet it doesn't mean that I am confident about jemalloc.
If SSE/AVX cannot be used in malloc implementation, simple idea is to disable
it:
gcc -mno-sse -fvisibility=hidden -fPIC -DPIC -c -D_GNU_SOURCE -D_REENTRANT
-Iinclude -Iinclude -o src/jemalloc.pic.o src/jemalloc.c
In file included from include/jemalloc/internal/jemalloc_internal.h:1037:0,
from src/jemalloc.c:2:
include/jemalloc/internal/prof.h: In function 'prof_sample_threshold_update':
include/jemalloc/internal/prof.h:349:40: error: SSE register return with SSE
disabled
prof_tdata->threshold = (uint64_t)(log(u) /
Disabling SSE does not allow jemalloc to be compiled, even with --disable-stats.
./include/jemalloc/internal/prof.h
349 prof_tdata->threshold = (uint64_t)(log(u) /
350 log(1.0 - (1.0 / (double)((uint64_t)1U << opt_lg_prof_sample))))
351 + (uint64_t)1U;
This requires SSE register.
>From Microsoft x86_64 calling conventions, same applies for Linux:
"All floating point operations are done using the 16 XMM registers."
Same with argument passing, all floats/doubles goes through XMM registers.
So these pieces of code _requires_ XMM registers. We just need to make sure
those pieces are never in ld.so call path.
My suggestions:
- By default jemalloc should be compiled without SSE/AXV. This is also means
that such options as "stats" by default is off. Special note must be added
about possible problems.
- Make sure that jemalloc passes the same or a similar test as in glibc on
multiple platforms.
> What about setting LD_BIND_NOW=1 in the environment so that all the
> register-corrupting badness happens prior to application execution (when it
> doesn't matter)?
It probably depends on application. It would increase a start time of
application, which is not always preferred.
My current thoughts would be that jemalloc shouldn't try to go around it, but
try to do similar QA as glibc, get rid of SSE/AVX registers in functions
possibly used by ld.so.
david
> Thanks,
> Jason
_______________________________________________
jemalloc-discuss mailing list
[email protected]
http://www.canonware.com/mailman/listinfo/jemalloc-discuss