On Fri, Sep 10, 2010 at 1:40 PM, Adam <[email protected]> wrote:
> I'm keeping a large number of data points in multiple 2d arrays, for > example: > > class c(object): > def __init__(self): > self.a = np.zeros((24, 60)) > self.b = np.zeros((24, 60)) > ... > > After processing the data, I'm serializing these to disk for future > reference/post-processing. It's a largish amount of data and is only > going to get larger. > > Would it be more efficient (in terms of memory/disk storage to use a > single n-dimensional array: > > self.a = np.zeros((24, 60, 5)) > > What other advantages (if any) would I gain from storing the data in a > single array rather than multiple? The deeper into this project I get, > the more I am probably going to need to correlate data points from one > or more of the arrays to one or more of the other arrays. > > I think I just answered my own question... > > Adam, One argument *against* merging all of the data into a single array is that the array needs a contiguous portion of memory. If the quantity of data reaches a certain point, the OS may have to do more work in allocating the space for your array. Smaller chunks may fit better. Of course, this is entirely dependent upon usage patterns, available RAM, OS, the day of the week, and the color of your socks. YMMV. Ben Root
_______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
