sxjscience opened a new issue #17127: [mxnet 2.0][item 7.2] RaggedNDArray in MXNet URL: https://github.com/apache/incubator-mxnet/issues/17127 # Introduction Many machine learning problems involve manipulating a collection of tensors with different shapes. For example, in machine translation, the source and target sentences may have different lengths. In object detection, the images have different sizes and each image is associated with a different number of bounding boxes. In graph neural network, each node has a different number of neighborhoods. For learning word embeddings, it proves helpful to use higher dimensional vectors to represent more frequent words and lower dimensional ones for less frequent words [1]. In this scenario, the embedding “matrix” consists of a series of vectors with different lengths. The classical solution is to preprocess the data via padding, cropping, or resizing to ensure that all samples have the same shape and can be stacked as a batch. However, this places additional burden to the user and also introduces overheads in the data loading pipeline. Moreover, as we can see later, RaggedNDArray is able to represent data that contains a hierarchical structure, e.g., the sentence → word → character hierarchy. Thus, it is suitable for describing some hierarchical models in NLP, e.g., using a CharCNN to get the word embeddings and inserting an LSTM on top for language modeling [2]. This motivates us to support the RaggedNDArray as the first-class data type in MXNet. RaggedNDArray is a general format for representing a list or a nested list of n-dimensional NDArrays with different shapes. It was proposed in the initial Gluon-API interface (https://github.com/gluon-api/gluon-api/blob/master/docs/ndarray.rst). # References [1] Baevski, Alexei, and Michael Auli. "Adaptive input representations for neural language modeling." ICLR (2018). [Paper Link](https://openreview.net/pdf?id=ByxZX20qFQ) [2] Kim, Yoon, et al. "Character-aware neural language models." AAAI (2016). [Paper Link](https://arxiv.org/pdf/1508.06615.pdf)
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
