Hello MXNet devs, I'd like to start a thread discussing what our build system should look like in MXNet 2.0. I'd propose that although the current make system has served us well in the past, we remove it along with the bump to 2.0. The end goal I'd like to see is that we have a clean build system, without a bunch of conditional logic that makes contributing and testing MXNet a simpler process. Additionally I'd propose we target a minimum cmake version of 3.7 for reasons described below.
First I'd like to give some context on why I'd propose we don't just switch to cmake, but we also target a relatively new version (version 3.7 from Nov, 2016) of cmake. The largest benefits in making this change would apply to CUDA builds where cmake itself has quite inconsistent functionality between versions. One persistent annoyance I've had with cmake is that we've had conditional logic for the FindCUDA command which at one point targeted some modern cmake features, but then in subsequent versions of cmake the way these features works was tweaked, and now I find these cmake features are consistently broken to the point that I require a bunch of -D defines to compile properly or to use an IDE. An additional CUDA related issue is that every time there's a new SM added to NVCC we have to make a few source changes to support it. I could see this being problematic for users who may suddenly realize that due to their compilation settings, they may not actually be enabling the features they think they are with their shiny new GPUs. As an alternative if we, for example, target cmake 3.7 at a minimum, and we want to find cuda and then build a list of reasonable PTX/BINS we could use the following command[1]: ---- FindCUDA(...) ... CUDA_SELECT_NVCC_ARCH_FLAGS(ARCH_FLAGS 3.0 3.5+PTX 5.2(5.0) Maxwell) LIST(APPEND CUDA_NVCC_FLAGS ${ARCH_FLAGS}) ---- Simple, concise, and it would help to make the building experience more consistent across platforms, build environments and IDEs (looking at you CLion). We'd of course need to do a little experimentation work to make sure that this does indeed work as intended, and can replace the currently complex findCuda logic we have in our build systems, but for the sake of the proposal let's assume these cmake commands do indeed work consistently as documented from cmake 3.7 onwards. To give users a chance to update their tooling I'd also suggest we begin warning users at least a release in advance that make based builds will be deprecated in MXNet 2.0 so they can begin migrating to cmake. I'd also want to display deprecation messages for unused cmake flags (such as the profiler flag) for a release before the 2.0 release, and then remove them in 2.0. Of course not all users have cmake 3.7 on their systems, some of our employers force use to use ridiculously outdated linux distributions. The good news for these users is that if we can offer Docker compilation with an image that has a supported version of cmake and we should be able to build a portable binary that work even with very old distributions of Linux. Additionally installing cmake from source is also fairly straightforward [2] and works quite well on older distros in my experience. Looking forward to hearing what others think. Any preferred build systems that you all would want to use? Is cmake the right system to centralize on? If so, is version 3.7 a reasonable minimum version to target? Is the 2.0 release a good point at which we can think about simplifying build logic? 1: https://cmake.org/cmake/help/v3.7/module/FindCUDA.html 2: https://github.com/Kitware/CMake