aaronmarkham commented on a change in pull request #15137: 1.5.0 news URL: https://github.com/apache/incubator-mxnet/pull/15137#discussion_r297452851
########## File path: NEWS.md ########## @@ -17,6 +17,855 @@ MXNet Change Log ================ +## 1.5.0 + +### New Features + +#### Automatic Mixed Precision(experimental) +Training Deep Learning networks is a very computationally intensive task. Novel model architectures tend to have increasing numbers of layers and parameters, which slow down training. Fortunately, software optimizations and new generations of training hardware make it a feasible task. +However, most of the hardware and software optimization opportunities exist in exploiting lower precision (e.g. FP16) to, for example, utilize Tensor Cores available on new Volta and Turing GPUs. While training in FP16 showed great success in image classification tasks, other more complicated neural networks typically stayed in FP32 due to difficulties in applying the FP16 training guidelines. +That is where AMP (Automatic Mixed Precision) comes into play. It automatically applies the guidelines of FP16 training, using FP16 precision where it provides the most benefit, while conservatively keeping in full FP32 precision operations unsafe to do in FP16. To learn more about AMP, check out this [tutorial](https://github.com/apache/incubator-mxnet/blob/master/docs/tutorials/amp/amp_tutorial.md). + +#### MKL-DNN Reduced precision inference and RNN API support +Two advanced features, fused computation and reduced-precision kernels, are introduced by MKL-DNN in the recent version. These features can significantly speed up the inference performance on CPU for a broad range of deep learning topologies. MXNet MKL-DNN backend provides optimized implementations for various operators covering a broad range of applications including image classification, object detection, and natural language processing. Refer to the [MKL-DNN operator documentation](https://github.com/apache/incubator-mxnet/blob/v1.5.x/docs/tutorials/mkldnn/operator_list.md) for more information. + +#### Dynamic Shape(experimental) +MXNet now supports Dynamic Shape in both imperative and symbolic mode. MXNet used to require that operators statically infer the output shapes from the input shapes. However, there exist some operators that don't meet this requirement. Examples are: +* while_loop: its output size depends on the number of iterations in the loop. +* boolean indexing: its output size depends on the value of the input data. +* many operators can be extended to take a shape symbol as input and the shape symbol can determine the output shape of these operators (with this extension, the symbol interface of MXNet can fully support shape). +To support dynamic shape and such operators, we have modified MXNet backend. Now MXNet supports operators with dynamic shape such as [`contrib.while_loop`](https://mxnet.incubator.apache.org/api/python/ndarray/contrib.html#mxnet.ndarray.contrib.while_loop), [`contrib.cond`](https://mxnet.incubator.apache.org/api/python/ndarray/contrib.html#mxnet.ndarray.contrib.cond), and [`mxnet.ndarray.contrib.boolean_mask`](https://mxnet.incubator.apache.org/api/python/ndarray/contrib.html#contrib) +Note: Currently dynamic shape does not work with Gluon defferred initialization. + +#### Large Tensor Support +Currently, MXNet supports maximal tensor size of around 4 billon (2^32). This is due to uint32_t being used as the default data type for tensor size, as well as variable indexing. +This limitation has created many problems when larger tensors are used in the model. +A naive solution to this problem is to replace all uint32_t in the MXNet backend source code to int64_t. +This solution is not viable, however, because many data structures use uint32_t as the data type for its members. +Unnecessarily replacing these variables to int64_t will increase the memory consumption causing another limitation. Second, MXNet has many submodule dependencies. +Updating the variable types in the MXNet repository is not enough. We also need to make sure different libraries, such as MKLDNN, MShadow etc. supports the int64_t integer data type. +Third, many front end APIs assume unsigned 32-bit integer interface. Only updating the interface in C/C++ will cause all the language bindings to fail. +Therefore, we need a systematic approach to enhance MXNet to support large tensors. +Now you can enable large tensor support by changing the following build flag to 1: `USE_INT64_TENSOR_SIZE = 1`. Note this is set to 0 by default. +For more details please refer to the [design document](https://cwiki.apache.org/confluence/display/MXNET/Large+Tensor+Support). + +#### Dependency Update +MXNet has added support for CUDA 10, CUDA 10.1, cudnn7.5, NCCL 2.4.2, and numpy 1.16.0. +These updates are available through PyPI packages and build from source, refer to [installation guid](https://mxnet.incubator.apache.org/versions/master/install/index.html) for more details. + +#### Gluon Fit API(experimental) +Training a model in Gluon requires users to write the training loop. This is useful because of its imperative nature, however repeating the same code across multiple models can become tedious and repetitive with boilerplate code. +The training loop can also be overwhelming to some users new to deep learning. We have introduced an Estimator and Fit API to help facilitate training loop. +Note: this feature is still experimental, for more details, refer to [design document](https://cwiki.apache.org/confluence/display/MXNET/Gluon+Fit+API+-+Tech+Design). + +#### New Operators +* split_v2 (#13687) +* Gradient multiplier (contrib) operator (#13632) +* Image normalize operator - GPU support, 3D/4D inputs (#13802) +* Image ToTensor operator - GPU support, 3D/4D inputs (#13837) +* Add Gluon Transformer Crop (#14259) +* GELU (#14449) +* AdamW operator (Fixing Weight Decay Regularization in Adam) (#13728) +* [MXNET-1382] Add the index_array operator (#14638) +* add an operator for computing the likelihood of a Hawkes self-exciting process (#14683) +* Add numpy linspace (#14927) + + +### Feature Improvements + +#### Operators +* make ROIAlign support position-sensitive pooling (#13088) +* Add erfinv operator for calculating inverse error function (#13811) +* Added optional parameters to BilinearResize2D to do relative scaling (#13985) +* MXNET-1295 Adding integer index support to Sequence* family of operators. (#13880) +* Export resize and support batch size (#14014) +* CUDNN dropout (#13896) +* Relaxing type requirements for slice_like op (#14097) +* Relaxing type requirements for reshape_like op (#14325) +* Parallelize CPU version and add GPU version of boolean_mask op (#14090) +* Add NHWC layout support to Pooling (cpu, gpu cuda, gpu cuDNN) (#13749) +* Multi-precision AdamW update op (#14171) +* [op] add back support for scalar type rescale_grad argument for adamw_update/mp_adamw_update (#14221) +* move choose_element_0index to operator (#14273) +* Optimize NMS (#14290) +* Optimize NMS part 2 (#14352) +* add backgroud class in box_nms (#14058) +* Use cudnn for dropout by default (#14278) +* In-place updates for Nadam, Adadelta, Adamax and SGLD (#13960) +* Aggregate SGD (#13346) +* Add proper exception message for negative shape in array creation routines (#14362) +* Support multi-threading for Custom Operator (#14363) +* moveaxis operator now accepts negative indices and sequence of ints as well. (#14321) +* Support SyncBatchNorm5D (#14542) +* Add nd.power and sym.pow (#14606) +* Change RNN OP to stateful (#14476) +* Add imresize and copyMakeBorder to mx.image (#13357) +* add ctx for rand_ndarray and rand_sparse_ndarray (#14966) +* Add cpu implementation for Deformable PSROIPooling (#14886) +* Add warning for fp16 inputs with MXNET_SAFE_ACCUMULATION=0 (#15046) +* Safe LayerNorm (#15002) +* use MXNET_SAFE_ACCUMULATION for softmax accumulator (#15037) +* LayerNorm acceleration on GPU (#14935) +* Add matrix inversion operator in linalg (#14963) +* implementation for equivalence of tf.moments (#14842) +* Use env var to enforce safe accumulation in ReduceAxesCompute (#14830) +* [MXNet-1211] Factor and "Like" modes in BilinearResize2D operator (#13226) +* added extraction/generation of diagonal and triangonal matrices to linalg (#14501) +* [Mxnet-1397] Support symbolic api for requantize and dequantize (#14749) +* [MXNET-978] Support higher order gradient for `log`. (#14992) +* Add cpu implementation for Deformable Convolution (#14879) + +#### MKLDNN +* Feature/mkldnn static (#13628) +* Feature/mkldnn static 2 (#13503) +* support mkl log when dtype is fp32 or fp64 (#13150) +* Add reshape op supported by MKL-DNN (#12980) +* Move the debug output message into MXNET_MKLDNN_DEBUG (#13662) +* Integrate MKLDNN Conv1d and support 3d layout (#13530) +* Making MKL-DNN default on MXNet master (#13681) +* Add mkldnn OP for slice (#13730) +* mkldnn s8 conv API change for master (#13903) +* [MKLDNN] Enable signed int8 support for convolution. (#13697) +* add mkldnn softmax_output (#13699) +* MKLDNN based Quantized FullyConnected Operator and its fusion (#14128) +* Fix entropy for uint8 (#14150) +* Update MKL-DNN to v0.18 release (was: fix the Dense layer issue) (#13668) +* [MKL-DNN] Enable s8 support for inner product and 3d input with flatten=false (#14466) +* Optimize transpose operator with MKL-DNN (#14545) +* [MKLDNN] Remove repeat parts in MKLDNN.md (#14995) +* [MKLDNN] Enable more convolution + activation fusion (#14819) +* Update MKL-DNN submodule to v0.19 (#14783) +* Add mkldnn_version.h to pip package (#14899) +* [MKLDNN] add quantized sum (#14614) +* [MKLDNN]Refactor requantize to speed up execution (#14608) +* [MKLDNN]Add quantized relu (#14604) +* Add MKLDNN headers to pip package (#14339) +* add symbolic link to mkldnn header files in include (#14300) +* disable default MKLDNN for cross compilation (#13893) +* Update MKLDNN_README.md (#13653) +* [Quantization] Support zero-size tensor input for quantization flow (#15031) +* Support 3D input for MKL-DNN softmax operator (#14818) +* Add primitive cache for MKL-DNN sum(elemwise_add operator (#14914) +* Fix reshape to add in-place back (#14903) +* [int8] Add MobileNetV2_1.0 & ResNet18 Quantization (#14823) +* [MKLDNN]Improve quantizeV2 and dequantize latency (#14641) +* added mkldnn dependency for plugin compile target (#14274) +* Support Quantized Fully Connected by INT8 GEMM (#12922) + +#### ONNX +* ONNX export: Instance normalization, Shape (#12920) +* ONNX export: Logical operators (#12852) +* ONNX import/export: Size (#13112) +* ONNX export: Add Flatten before Gemm (#13356) +* ONNX import/export: Add missing tests, ONNX export: LogSoftMax (#13654) +* ONNX import: Hardmax (#13717) +* [MXNET-898] ONNX import/export: Sample_multinomial, ONNX export: GlobalLpPool, LpPool (#13500) +* ONNX ops: norm exported and lpnormalization imported (#13806) +* [MXNET-880] ONNX export: Random uniform, Random normal, MaxRoiPool (#13676) +* ONNX export: Add Crop, Deconvolution and fix the default stride of Pooling to 1 (#12399) +* onnx export ops (#13821) +* ONNX export: broadcast_to, tile ops (#13981) +* ONNX export: Support equal length splits (#14121) + +#### TensorRT +* [MXNET-1252][1 of 2] Decouple NNVM to ONNX from NNVM to TenosrRT conversion (#13659) +* [MXNET-703] Update to TensorRT 5, ONNX IR 3. Fix inference bugs. (#13310) +* [MXNET-703] Minor refactor of TensorRT code (#13311) +* reformat trt to use subgraph API, add fp16 support (#14040) + +#### FP16 Support +* Update mshadow to support batch_dot with fp16. (#13716) +* float32 → float16 cast consistency across implementations (#13857) +* modifying SyncBN doc for FP16 use case (#14041) +* support dot(vector, vector) for fp16 inputs on GPU (#14102) +* softmax for fp16 with fp32 accumulator (#14098) +* [MXNET-1327] Allow RNN Layers to be initialized to fp16 (#14219) +* fp16 safe norm operator (#14616) +* NAG Optimizer with multi-precision support (#14568) + +#### Deep Graph Library(DGL) support +* Add graph_compact operator. (#13436) +* Accelerate DGL csr neighbor sampling (#13588) + +#### Horovod Integration +* Add extra header file to export for error checking (#13795) +* whitelist symbols for using MXNet error handling externally (#13812) +* Use CPUPinned context in ImageRecordIOParser2 (#13980) +* Add pin_device_id option to Gluon DataLoader (#14136) + +#### Dynamic Shape +* [MXNET-1315] Add checks for dynamic-shaped operators in CachedOp (#14018) +* [MXNET-1325] Make InferShapeAttr a standalone pass (#14193) +* [MXNET-1324] Add NaiveRunGraph to imperative utils (#14192) +* [MXNET-1352] Allow dynamic shape in while_loop and if conditionals (#14393) + +#### Backend Engine +* Add infer_type_partial (#14214) +* Tidy up storage allocation and deallocation (#14480) +* Add MXEnginePushAsync and MXEnginePushSync C APIs (#14615) +* Enhance subgraph API (#14113) +* Enhance PartitionGraph (#14277) +* Allow clearing gpu cache (#14252) +* Fix warning / static function in header. (#14900) +* Simplify creation of NodeEntry instances and use emplace_back (#14095) +* Add unpooled gpu memory type (#14716) +* [MXNET-1398] Enable zero-copy from numpy to MXNet NDArray (#14733) +* Use DEFAULT macro in C APIs (#14767) +* Avoid uneccesary vector copies in imperative_utils.cc (#14665) Review comment: ```suggestion * Avoid unnecessary vector copies in imperative_utils.cc (#14665) ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
