Hello, 

We finally managed to prove that the problem was with avx-512 by using qemu we 
can enable/disable avx-512 and do the computation with exactly the same guix 
pack and recover that this gives different results. The qemu avx-512 results 
match bitwise the results from laptop on Ubuntu that have avx-512 and 
conversely that the qemu without avx-512 have the same results as the Arch 
laptop that also does not have AVX-512. 

However, I am not sure which dependency of mujoco have the problematic avx-512 
computation and I can't really patch all the dependencies to use -march=x86-64 
as this would be a very long process. I tried to use Guix --tune option but 
there are only two tunable packages in the dependencies and apparently this 
does not change anything. I was wondering if I could use some sort of gcc on 
which I disable avx-512 support with some package transformation like 
with-input ? What do you think ? 

Anyway, now I have my answer: the culprit is Avx-512. This is still very much a 
surprise as we did not believe at first that such a low-level instruction set 
could cause such a difference in the en result (this is not a 10^-6 difference, 
the difference is of order 2 or 3 percent of the result). I think this is quite 
interesting to know that Avx-512 can be responsible for reproducibility 
problems. 

Cheers, 
Timothée 



De: "Timothee Mathieu" <timothee.math...@inria.fr> 
À: "Etienne B. Roesch" <etienne.roe...@gmail.com> 
Cc: "Andreas Enge" <andr...@enge.fr>, "Ludovic Courtès" 
<ludovic.cour...@inria.fr>, "Steve George" <st...@futurile.net>, "Cayetano 
Santos" <csant...@inventati.org>, "help-guix" <help-guix@gnu.org> 
Envoyé: Mercredi 14 Mai 2025 15:38:18 
Objet: Re: Reproducibility of guix shell container across different host OS 




BQ_BEGIN

Hello, 

yes mujoco is packaged in guix and I did it so I hope it is correct :) 
I checked that on all the computers the resulting compiled package have exactly 
the same hash so they should be identical on all the machine. I also tried by 
just copying a guix pack tar.gz file, uncompress and run the code so really 
there should be no difference. 

Timothée 



BQ_BEGIN
De: "Etienne B. Roesch" <etienne.roe...@gmail.com> 
À: "Timothee Mathieu" <timothee.math...@inria.fr> 
Cc: "Andreas Enge" <andr...@enge.fr>, "Ludovic Courtès" 
<ludovic.cour...@inria.fr>, "Steve George" <st...@futurile.net>, "Cayetano 
Santos" <csant...@inventati.org>, "help-guix" <help-guix@gnu.org> 
Envoyé: Mercredi 14 Mai 2025 12:19:44 
Objet: Re: Reproducibility of guix shell container across different host OS 

BQ_END


BQ_BEGIN

Very interesting. 
Is it the case that mujoco is packaged correctly in guix, but then itself calls 
different routines depending on the running architecture? (or alternatively, it 
wouldn't be packaged "correctly" (or not at all!) and be compiled with 
different flags on different architectures, .. then I think that would have 
shown in your investigation of diff) 

Etienne 

On Wed, May 14, 2025 at 8:45 AM Timothee Mathieu < [ 
mailto:timothee.math...@inria.fr | timothee.math...@inria.fr ] > wrote: 

BQ_BEGIN
Hello, 

After a lot of experimentations and discussion with colleagues, I found that 
the culprit! It seems to be AVX-512. Apparently, the physics behind my 
simulator uses AVX (cf [ 
https://mujoco.readthedocs.io/en/stable/programming/index.html | 
https://mujoco.readthedocs.io/en/stable/programming/index.html ] ). 
The result of my script is different on a computer that has AVX-512 compared to 
one that does not have it (as verified through lscpu). 

I am not super familiar with such low level instructions, but I verified that 
on three separate AVX-512 computers I got the same result and on 5 separate non 
AVX-512 I got the other result. 

I am not sure if I understand everything about AVX, I tried to tune the 
compilation to CPU without AVX with [ 
https://hpc.guix.info/blog/2022/01/tuning-packages-for-a-cpu-micro-architecture/
 | 
https://hpc.guix.info/blog/2022/01/tuning-packages-for-a-cpu-micro-architecture/
 ] in order to get reproducible results, but it did not work, maybe because 
only a few of the dependency packages are tunable. Is there a way to force 
everything to use AVX and not AVX-512? I understand that AVX-512 is meant to be 
faster but I think in my case before being faster I want to see if it is 
possible to be reproducible. 

Thanks, 
Timothée 


----- Mail original ----- 
> De: "Timothee Mathieu" < [ mailto:timothee.math...@inria.fr | 
> timothee.math...@inria.fr ] > 
> À: "Andreas Enge" < [ mailto:andr...@enge.fr | andr...@enge.fr ] > 
> Cc: "Ludovic Courtès" < [ mailto:ludovic.cour...@inria.fr | 
> ludovic.cour...@inria.fr ] >, "Steve George" < [ mailto:st...@futurile.net | 
> st...@futurile.net ] >, "Cayetano Santos" 
> < [ mailto:csant...@inventati.org | csant...@inventati.org ] >, "help-guix" < 
> [ mailto:help-guix@gnu.org | help-guix@gnu.org ] > 
> Envoyé: Mercredi 7 Mai 2025 09:34:44 
> Objet: Re: Reproducibility of guix shell container across different host OS 

> I checked and I am now convinced that the fault lies in the physics simulator 
> as 
> I tried on other simpler reinforcement learning environments and everything 
> was 
> reproducible, so it is not due to the neural network part (which is already 
> impressive I guess as neural network libraries tend to be quite a mess 
> reproducibility-wise). 
> 
> So it seems that something weird is going on with mujoco, the physics 
> simulator 
> for which we did a package. And it seems that it is the interaction between 
> mujoco and the neural network from pytorch because using random action seems 
> reproducible. 
> I guess this could be due to floating point rounding error, although the 
> difference seems to be huge for this to be rounding error. The computation is 
> quite long so maybe the errors amplify, but I am a bit doubtful about this 
> because I found a complete reproducibility between my laptop and some 
> powerful 
> servers with very different hardware, wouldn't the results be different with 
> very different hardware if the problem was rounding error? 
> 
> Is there a way to check whether this is due to floating point calculation 
> rounding error? I tried to use Float64 instead of Float 32 and it does not 
> change that I have non-reproducible results (although it changes the value a 
> little bit, in the scale of 10^{-5}). 
> 
> Thanks, 
> Timothée 
> 
> ----- Mail original ----- 
>> De: "Andreas Enge" < [ mailto:andr...@enge.fr | andr...@enge.fr ] > 
>> À: "Ludovic Courtès" < [ mailto:ludovic.cour...@inria.fr | 
>> ludovic.cour...@inria.fr ] > 
>> Cc: "Timothee Mathieu" < [ mailto:timothee.math...@inria.fr | 
>> timothee.math...@inria.fr ] >, "Steve George" 
>> < [ mailto:st...@futurile.net | st...@futurile.net ] >, "Cayetano Santos" 
>> < [ mailto:csant...@inventati.org | csant...@inventati.org ] >, "help-guix" 
>> < [ mailto:help-guix@gnu.org | help-guix@gnu.org ] > 
>> Envoyé: Mardi 6 Mai 2025 10:30:12 
>> Objet: Re: Reproducibility of guix shell container across different host OS 
> 
>> Am Tue, May 06, 2025 at 09:26:51AM +0200 schrieb Ludovic Courtès: 
>>> Do you have evidence that the problem is a leak like this? Or could it 
>>> be that the Python code being run is non-deterministic? 
>>> If you run ‘guix shell -CN --no-cwd coreutils’, you can see with ‘ls’ 
>>> etc. that nothing leaks from the host OS (apart of course from the 
>>> kernel). 
>> 
>> Or maybe the hardware "leaks"? Are the two machines exactly identical, 
>> in particular, do they have the exact same processor? Since the 
>> differences involve floating point computations, I would not be 
>> surprised if the precise processor architecture made a difference. 
>> 
>> Someone mentioned the IEEE-754 standard in the thread, which mandates 
>> that basic arithmetic operations follow a precise, deterministic 
>> semantics, but not necessarily trigonometric functions. 
>> 
>> Also, if I remember well, special flags are required to make GCC emit 
>> IEEE conforming code; otherwise the old, but faster x86 80 bit extended 
>> precision built into the processor is used. I have seen a case where 
>> *printing* a variable changed its value, because this meant it would be 
>> moved from an 80 bit processor register to a 64 bit memory location. 
>> Otherwise said, something like the following code: 
>> double x = ...; 
>> if (x!=some value) { 
>> printf ("%f", x); 
>> if (x!=some value) // the same value as above, of course 
>> printf ("0"); 
>> else 
>> printf ("1"); 
>> } 
>> would print x, followed by "1"... 
>> 
>> See this thread: 
>> [ https://lists.gnu.org/archive/html/guix-devel/2023-03/msg00277.html | 
>> https://lists.gnu.org/archive/html/guix-devel/2023-03/msg00277.html ] 
>> and commit 098bd280f82350073e8280e37d56a14162eed09c . 
>> 
>> If you want deterministic, reproducible floating point computations, 
>> I am afraid you would need to use the (comparably slow in low precision) 
>> GNU MPFR and GNU MPC libraries; or use interval arithmetic from FLINT 
>> and replace exact comparisons by looking at intersections of intervals. 
>> 
> > Andreas 


BQ_END


BQ_END


BQ_END


Reply via email to