@samskalicky Thanks! I have just thought of two models (both for relation extraction/classification) that would be made more convenient to implement with the help of the diag/trace operator:
* Learning with Noise: Enhance Distantly Supervised Relation Extraction with Dynamic Transition Matrix (https://arxiv.org/abs/1705.03995). They use a matrix to model the transition from the true label to the noisy one, whose trace appears in the loss as a regularizer. * Neural Relation Extraction with Selective Attention over Instances (http://aclweb.org/anthology/P16-1200). The output of this model is the diagonal of a matrix whose rows are first softmaxed. When fed with a batch of inputs, the outputs could be fetched by taking the diagonals of several matrices at the same time. [ Full content available at: https://github.com/apache/incubator-mxnet/issues/12327 ] This message was relayed via gitbox.apache.org for [email protected]
