Hi! 

I just found that the `diag` operator does not support N-d arrays where `N > 
2`. According to my own experience, it could be made more useful if the `N > 2` 
cases are properly designed. For example, I find it troublesome to take the 
diagonals of several matrices of the same shape at the same time. I know this 
task could be accomplished with a combination of `arange`, `tile` and `pick`, 
but it would be very complicated, confusing and error-prone. To support this, 
the behaviour when `N > 2` could be designed as taking the diagonal of the last 
two axes, i.e., when fed with an array of shape `[d1, d2, d3, ..., dn-2, dn-1, 
dn]`, where the diagonal of `[dn-1, dn]` is of length `k`, `diag` would return 
an array of shape `[d1, d2, d3, ..., dn-2, k]`. Of course, this could be 
designed to be more flexible (allowing specifying the axes to reduce, for 
example).

PyTorch provides a `diag` operator that behaves in the same way. Tensorflow 
actually splits it into two operators, `diag` and `diag_part`, the former of 
which constructs diagonal matrices and the latter takes diagonals from 
matrices. They are designed to support `N > 2` but not in a way I find useful 
or flexible.

On the MXNet forum: https://discuss.mxnet.io/t/diag-for-n-d-arrays/1707

[ Full content available at: 
https://github.com/apache/incubator-mxnet/issues/12327 ]
This message was relayed via gitbox.apache.org for [email protected]

Reply via email to