[replying on the debian-med list with permission. Please keep Martin and Milot CC'd as they do not subscribe]
On Fri, May 8, 2020 at 7:36 PM Milot Mirdita <mi...@mirdita.de> wrote: > Hi Michael, > > I am a developer on the MMseqs2 team and I saw your tweet regarding the > AWS ARM64 machines earlier and checked on Debian Salsa if it would be a lot > of work enabling ARM64 support with the next release as we worked on that > recently. > Hey Milot, thanks for your email! I saw that Debian's MMseqs2 now uses SIMDe to abstract away different > architectures. While this is a very cool technical achievement, I am very > uncomfortable with it without being properly integrated into and monitored > by our CI regression testing. > > During ARM64 development I found that there are a lot of subtle issues > that can result in differing sensitivity between architectures (e.g. > ARM64's default unsigned char type causes issues, there are many crashes on > 32-bit ARM). I am also worried that our two most important platforms > (SSE4.1 and AVX2) might suffer from performance regressions. > Interesting! On Debian we have to provide binaries that respect the architecture baseline. That means no SSE-, SSE2-, only binaries on i386/i686 and no SSE3+ only binaries on AMD64. So that's why we compile mmseqs2 multiple times, so there is a version that doesn't violate the baseline, along with versions that should match the highest level of SIMD support available on the user's CPU. https://salsa.debian.org/med-team/mmseqs2/-/blob/master/debian/rules#L22 https://salsa.debian.org/med-team/mmseqs2/-/blob/master/debian/bin/simd-dispatch > > We will have ARM64 and hopefully also PPC64LE support in the next release. > I would suggest to either wait and use our upstream code, or submit a PR > with your changes to us and see how we can integrate everything correctly. > Sure, happy to send the patches! I meant to, but hadn't gotten around to it yet https://wiki.debian.org/SIMDEverywhere#Packages_Status > > Also I would be very glad if you could integrate the full regression suite > to spot if all architectures produce consistent results. You can run the > regression by calling from the repository: > git submodule update --init > ./util/regression/run_regression.sh ./path-to-mmseqs-binary > scratch-directory > Oh yeah, would love to! Except we need all the upstream sources in a single tarball, which git submodules + GitHub releases makes difficult. So if you can add a pure source (with all git submodules) tarball to https://github.com/soedinglab/MMseqs2/releases that would be appreciated! > > We had refactored this test suite to make it as easy as possible to use > for Shayan who initially had proposed to package MMseqs2 for Debian. The > test subfolder is badly named and contains scratch scripts for feature > development. They don't do anything useful for testing such as finding > performance or sensitivity drops. > Noted. > Thanks for your work and best regards, > Thank you for sharing your work under a F/OSS license and for your contributions to Open Science! > Milot > -- Michael R. Crusoe