Hello,

At last we managed to solve the problem! The solution is maybe a bit hacky but 
for now, it works.

After further investigations, it seems that the problem was not from the 
simulator itself but with the fact that it simulates contact which are very 
sensitive to even a small difference in the input actions. I discovered that 
pytorch (and maybe other dependencies) has a reproducibility problem of order 
1e-5 when on AVX512 compared to AVX2. I first tried to solve the problem by 
disabling AVX512 at the level of pytorch, but it did not work. The dev of 
pytorch said that it may be because some components dispatch computation to 
MKL-DNN, I tried to disable AVX512 on MKL, and still the results were not 
reproducible, I also tried to deactivate in openmpi without success.
I finally concluded that there was a problem with AVX512 somewhere in the 
dependencies graph but I gave up identifying where, as this seems very 
complicated.

Instead, I found a  tool https://github.com/twosigma/libvirtcpuid/ which allows 
me to mask avx512 from the process and this worked! I was able to use it to 
modify glibc with a graft in the guix shell command to disable AVX512 in a guix 
shell command and get the exact same result on both AVX512 and non-AVX512 
computers without much of an overhead (there is no vm, the only difference 
seems to be a slight acceleration when using AVX512 as expected). 

I guess all of this should be a cautionary tale that sometimes it may be needed 
to look carefully at the cpu flags in order to get reproducibility with guix 
shell. Ideally I think when we want to have something reproducible, we may want 
to also communicate the cpu flags in addition to the manifest and channels file 
(or at least test that changing the flags does not change the results). 

To get reproducible results, we packaged libvirtcpuid with guix but in a pretty 
hacky way that works only if called through `guix shell -CF` in order to 
recover a FHS filesystem. It would be great if someday a feature to mask some 
CPU flag made its way to guix shell in order to improve reproducibility but I 
guess my case of having a big difference due to AVX512 is a limit case that 
does not happen often (?).

Best,
Timothée
----- Mail original -----
> De: "Ludovic Courtès" <ludovic.cour...@inria.fr>
> À: "Timothee Mathieu" <timothee.math...@inria.fr>
> Cc: "Etienne B. Roesch" <etienne.roe...@gmail.com>, "Andreas Enge" 
> <andr...@enge.fr>, "Steve George"
> <st...@futurile.net>, "Cayetano Santos" <csant...@inventati.org>, "help-guix" 
> <help-guix@gnu.org>
> Envoyé: Mercredi 28 Mai 2025 16:14:27
> Objet: Re: Reproducibility of guix shell container across different host OS

> Hi,
> 
> Timothee Mathieu <timothee.math...@inria.fr> writes:
> 
>> We finally managed to prove that the problem was with avx-512 by using
>> qemu we can enable/disable avx-512 and do the computation with exactly
>> the same guix pack and recover that this gives different results. The
>> qemu avx-512 results match bitwise the results from laptop on Ubuntu
>> that have avx-512 and conversely that the qemu without avx-512 have
>> the same results as the Arch laptop that also does not have AVX-512.
> 
> Are you saying that the same binaries in the same pack use AVX-512 when
> available and don’t use it otherwise?
> 
> This is the “ideal” load-time adjustment¹ but then you could run into
> the kind of numerical issue that you experience.  It’s a problem that I
> would discuss with the authors of the library, perhaps starting with
> mujoco itself.
> 
> Interesting case anyway!
> 
> Ludo’.
> 
> ¹ Discussed in
>  <https://hpc.guix.info/blog/2018/01/pre-built-binaries-vs-performance/>
>   and used by libraries like glibc, OpenBLAS, and more.

Reply via email to