ChaiBapchya commented on issue #17444: [Large Tensor] Add LT support for NN optimizers and 1 activation function URL: https://github.com/apache/incubator-mxnet/pull/17444#issuecomment-580900241 So I tested MXNet (build from source using this branch) with flags : ``` python -c "from mxnet.runtime import feature_list; print(feature_list())" [✔ CUDA, ✔ CUDNN, ✖ NCCL, ✔ CUDA_RTC, ✖ TENSORRT, ✔ CPU_SSE, ✔ CPU_SSE2, ✔ CPU_SSE3, ✔ CPU_SSE4_1, ✔ CPU_SSE4_2, ✖ CPU_SSE4A, ✔ CPU_AVX, ✖ CPU_AVX2, ✔ OPENMP, ✖ SSE, ✔ F16C, ✔ JEMALLOC, ✔ BLAS_OPEN, ✖ BLAS_ATLAS, ✖ BLAS_MKL, ✖ BLAS_APPLE, ✔ LAPACK, ✔ MKLDNN, ✔ OPENCV, ✖ CAFFE, ✖ PROFILER, ✖ DIST_KVSTORE, ✖ CXX14, ✔ INT64_TENSOR_SIZE, ✔ SIGNAL_HANDLER, ✖ DEBUG, ✖ TVM_OP] ``` Results for training 10 epochs on 8 GPUS ``` INFO:root:[Epoch 0] train=0.120292 val=0.158000 loss=6.658037 time: 109.734473 INFO:root:[Epoch 1] train=0.167548 val=0.179600 loss=2.297145 time: 92.212359 INFO:root:[Epoch 2] train=0.210777 val=0.237700 loss=2.109626 time: 92.110430 INFO:root:[Epoch 3] train=0.240705 val=0.255700 loss=2.032153 time: 92.476469 INFO:root:[Epoch 4] train=0.262039 val=0.273600 loss=1.976788 time: 94.570572 INFO:root:[Epoch 5] train=0.279728 val=0.302300 loss=1.915808 time: 91.655044 INFO:root:[Epoch 6] train=0.295393 val=0.309900 loss=1.868357 time: 94.903087 INFO:root:[Epoch 7] train=0.312901 val=0.331600 loss=1.825083 time: 94.501921 INFO:root:[Epoch 8] train=0.330889 val=0.334100 loss=1.788333 time: 95.653459 INFO:root:[Epoch 9] train=0.344211 val=0.349900 loss=1.757741 time: 94.065917 ``` Is this fine?
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
