> If I understand correctly, then the simplest correct fix for this problem is 
> to modify ld.so such that it preserves all caller-saved registers when 
> calling out to functions like realloc(3).  In my opinion, ld.so probably 
> shouldn't be using the normal malloc at all (instead use a directly embedded 
> minimal malloc implementation), because there are lots of mind-boggling ways 
> bootstrapping can fail, but that's a more involved change.

I think you cannot count on any fixes on ld.so side. Have you tried running 
glibc test with jemalloc?

> 
> Can you disable the gcc optimization (-fno-tree-vectorize) when building 
> jemalloc?  You won't hit actual floating point code in jemalloc unless you 
> enable heap profiling, so that should prevent XMM usage in all the relevant 
> jemalloc code.  If that works okay, I'll need to get a better understanding 
> of when to automatically configure the gcc flags that way when building 
> jemalloc.

I know that --disable-stats at least allows my computation workflows to run 
fine, yet it doesn't mean that I am confident about jemalloc.

If SSE/AVX cannot be used in malloc implementation, simple idea is to disable 
it:

gcc -mno-sse -fvisibility=hidden -fPIC -DPIC -c -D_GNU_SOURCE -D_REENTRANT 
-Iinclude -Iinclude -o src/jemalloc.pic.o src/jemalloc.c
In file included from include/jemalloc/internal/jemalloc_internal.h:1037:0,
                 from src/jemalloc.c:2:
include/jemalloc/internal/prof.h: In function 'prof_sample_threshold_update':
include/jemalloc/internal/prof.h:349:40: error: SSE register return with SSE 
disabled
  prof_tdata->threshold = (uint64_t)(log(u) /

Disabling SSE does not allow jemalloc to be compiled, even with --disable-stats.

./include/jemalloc/internal/prof.h

349         prof_tdata->threshold = (uint64_t)(log(u) /
350             log(1.0 - (1.0 / (double)((uint64_t)1U << opt_lg_prof_sample))))
351             + (uint64_t)1U;

This requires SSE register.

>From Microsoft x86_64 calling conventions, same applies for Linux:

"All floating point operations are done using the 16 XMM registers."

Same with argument passing, all floats/doubles goes through XMM registers.

So these pieces of code _requires_ XMM registers. We just need to make sure 
those pieces are never in ld.so call path.

My suggestions:
- By default jemalloc should be compiled without SSE/AXV. This is also means 
that such options as "stats" by default is off. Special note must be added 
about possible problems.
- Make sure that jemalloc passes the same or a similar test as in glibc on 
multiple platforms.

> What about setting LD_BIND_NOW=1 in the environment so that all the 
> register-corrupting badness happens prior to application execution (when it 
> doesn't matter)?

It probably depends on application. It would increase a start time of 
application, which is not always preferred.

My current thoughts would be that jemalloc shouldn't try to go around it, but 
try to do similar QA as glibc, get rid of SSE/AVX registers in functions 
possibly used by ld.so.

david

> Thanks,
> Jason

_______________________________________________
jemalloc-discuss mailing list
[email protected]
http://www.canonware.com/mailman/listinfo/jemalloc-discuss

Reply via email to