Hello Hagay, I will put this feedback on the Wiki then. Thanks for reading it!
On Tue, Jan 29, 2019 at 4:59 AM Hagay Lupesko <[email protected]> wrote: > Good stuff Zach and Qing! > Great feedback Edison - would be great if you leave it in the wiki, so that > it is saved in the context of the doc with other feedback, such as > Kellen's. > > On Mon, Jan 28, 2019 at 1:58 AM [email protected] < > [email protected]> wrote: > > > Hello all, > > > > First let me introduce myself: > > > > My name is Edison Gustavo Muenz. I have worked most of my career with > C++, > > Windows and Linux. I am a big fan of machine learning and now I joined > > Amazon in Berlin to work on MXNet. > > > > I would like to give some comments on the document posted: > > > > # change publish OS (Severe) > > > > As a rule of thumb, when providing your own binaries on linux, we should > > always try to compile with oldest glibc possible. Using CentOS7 for this > > regard (if possible due to the CUDA issues) is the way to go. > > > > # Using Cent OS 7 > > > > > However, all of the current GPU build scripts would be unavailable > since > > nvidia does not provide the corresponding packages for rpm. In this case, > > we may need to go with NVIDIA Docker for Cent OS 7 and that only provide > a > > limited versions of CUDA. > > > > > List of CUDA that NVIDIA supporting for Cent OS 7: > > > CUDA 10, 9.2, 9.1, 9.0, 8.0, 7.5 > > > > From what I saw in the link provided ( > > https://hub.docker.com/r/nvidia/cuda/), this list of versions is even > > bigger than the list of versions supported on Ubuntu 16.04. > > > > What am I missing? > > > > > Another problem we may see is the performance and stability difference > > on the backend we built since we downgrade libc from 2.19 to 2.17 > > > > I would like to first give a brief intro so that we're all on the same > > page. If you already know how libc versioning works, then you can skip > this > > part > > > > ## Brief intro on how libc versioning works > > > > In libc each symbol provided by libc has 2 components: > > - symbol name > > - version > > > > This can be seen with: > > > > ``` > > $ objdump -T /lib/x86_64-linux-gnu/libc.so.6 | grep memcpy > > 00000000000bd4a0 w DF .text 0000000000000009 GLIBC_2.2.5 wmemcpy > > 00000000001332f0 g DF .text 0000000000000019 GLIBC_2.4 > __wmemcpy_chk > > 000000000009f0e0 g iD .text 00000000000000ca GLIBC_2.14 memcpy > > 00000000000bb460 g DF .text 0000000000000028 (GLIBC_2.2.5) memcpy > > 00000000001318a0 g iD .text 00000000000000ca GLIBC_2.3.4 > __memcpy_chk > > ``` > > > > So it can be seen that there are different memory addresses for each > > version of memcpy. > > > > When linking a binary, the linker will always choose the most recent > > version of the libc symbol. > > > > An example: > > - your program uses the `memcpy` symbol > > - when linking, the linker will choose `memcpy` at version 2.14 > > (latest) > > > > When executing the binary then the libc provided on your system must have > > a memcpy at version 2.14, otherwise you get the following error: > > > > /lib/x86_64-linux-gnu/libm.so.6: version `libc_2.23' not found > > (required by /tmp/mxnet6145590735071079280/libmxnet.so) > > > > Also, a symbol has its version increased when there are breaking changes. > > So, libc will only increase the version of a symbol if any of its > > inputs/outputs changed in a non-compatible way (eg.: Changing the type > of a > > field to a non-compatible type, like int -> short). > > > > ## Performance difference between versions 2.17 and 2.19 > > > > This website is really handy for this: > > https://abi-laboratory.pro/?view=timeline&l=glibc > > > > If we look at the links: > > > > - > > > https://abi-laboratory.pro/index.php?view=objects_report&l=glibc&v1=2.18&v2=2.19 > > - > > > https://abi-laboratory.pro/index.php?view=objects_report&l=glibc&v1=2.17&v2=2.18 > > > > You can see that their binary compatibility is fine since no significant > > changes were made between these versions that could compromise the > > performance. > > > > Finally, I want to thank everyone for letting me part of this community. > > > > On 2019/01/23 21:48:48, kellen sunderland <[email protected]> > > wrote: > > > Hey Qing, thanks for the summary and to everyone for automating the > > > deployment process. I've left a few comments on the doc. > > > > > > On Wed, Jan 23, 2019 at 11:46 AM Qing Lan <[email protected]> wrote: > > > > > > > Hi all, > > > > > > > > Recently Zach announced the availability for MXNet Maven publishing > > > > pipeline and general static-build instructions. In order to make it > > better, > > > > I drafted a document that includes the problems we have for this > > pipeline: > > > > > > > https://cwiki.apache.org/confluence/display/MXNET/Outstanding+problems+with+publishing > > . > > > > Some of them may need to be addressed very soon. > > > > > > > > Please kindly review and leave any comments you may have in this > > thread or > > > > in the document. > > > > > > > > thanks, > > > > Qing > > > > > > > > > > > > > >
