Hi, ** 1. Here is the documentation of numpy.cumsum :
https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.cumsum.html ** 2. The key example is as follows: >>> np.cumsum(a,axis=0) # sum over rows for each of the 3 columns array([[1, 2, 3], [5, 7, 9]]) >>> np.cumsum(a,axis=1) # sum over columns for each of the 2 rows array([[ 1, 3, 6], [ 4, 9, 15]]) ** 3. I realize this can be easily implemented via for loops. However, for NDArrays on the GPU context, I'd prefer to use vectorized ops. In mxnet, what is the best way to do cumulative sum along axis? The final goal is to compute the integral image of a 2d matrix ( https://en.wikipedia.org/wiki/Summed-area_table ) Thus, solutions that compute the integral image without going through cumulative sum is fine too. --TongKe
