On Fri, Feb 25, 2011 at 10:52:09AM +0100, Fred wrote: > > What exactly do you mean by 'decimating'. To me is seems that you are > > looking for matrix factorization or matrix completion techniques, which > > are trendy topics in machine learning currently. > By decimating, I mean this:
> input array data.shape = (nx, ny, nz) -> data[::ax, ::ay, ::az], ie > output array data[::ax, ::ay, ::az].shape = (nx/ax, ny/ay, nz/az). OK, this can be seen as an interpolation on a grid with a nearest neighbor interpolator. What I am unsure about is whether you want to interpolate your NaN, or whether they just mean missing data. I would do this by representing the matrix as a sparse matrix in COO, this would give you a list of row and col positions for your data points. Then I would use a nearest neighbor (such as scipy's KDTree, or the scikit-learn's BallTree for even better performance http://scikit-learn.sourceforge.net/modules/neighbors.html) to find, for each grid point which data point is closest and fill in your grid. I suspect that your problem is that you can't fit the whole matrix in memory. If your data points are reasonnably homogeneously distributed in the matrix, I would simply process the problem using sub matrices, and making sure that I train the nearest neighbor on a sub matrix that is largest than the sampling grid by a factor of more than the inter-point distance. HTH, Gael _______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
