jseabold wrote >> >> Well, yes, if you work with the pure f_k and g_k that is true, but this >> two-dimensional array will have 4*10^14 elements and will exhaust my >> memory. >> >> That is why I have found a more efficient method for finding only the >> much >> fewer changes_at elements for each k, and these arrays have unequal >> length, >> and has to be considered for eack k (which is tolerable as long as I >> avoid a >> further inner loop for each k in explicit Python). >> >> I could implement this in C and get it done sufficiently efficient. I >> just >> like to make a point in demonstrating this is also doable in finite time >> in >> Python/numpy. >> > > If you want to attack it straight on and keep it conceptually simple, > this looks like it would work. Fair warning, I've never done this and > have no idea if it's actually memory and computationally efficient, so > I'd be interested to hear from experts. I just wanted to see if it > would work from disk. I wonder if a solution using PyTables would be > faster. > > Provided that you can chunk your data into a memmap array, then > something you *could* do > > N = 2*10**7 > chunk_size = 100000 > > farr1 = 'scratch/arr1' > farr2 = 'scratch/arr2' > > arr1 = np.memmap(farr1, dtype='uint8', mode='w+', shape=(N, 4)) > arr2 = np.memmap(farr2, dtype='uint8', mode='w+', shape=(N, 4)) > > for i in xrange(0, N, chunk_size): > arr1[i:i+chunk_size] = np.random.randint(2, size=(chunk_size, > 4)).astype(np.uint8) > arr2[i:i+chunk_size] = np.random.randint(2, size=(chunk_size, > 4)).astype(np.uint8) > > del arr1 > del arr2 > > arr1 = np.memmap(farr1, mode='r', dtype='uint8', shape=(N,4)) > arr2 = np.memmap(farr2, mode='r', dtype='uint8', shape=(N,4)) > > > equal = np.logical_and(arr1[:chunk_size], > arr2[:chunk_size]).sum(0) > > for i in xrange(chunk_size, N, chunk_size): > equal += np.logical_and(arr1[i:i+chunk_size], > arr2[i:i+chunk_size]).sum(0) > > Skipper
Thanks for the proposal Skipper, I have used memmap before, and this may work, but still the number of elementary and operations needed (although hidden under the hood of chunked logical_and) will be about a factor of 1000 larger than what is actually needed due to the sparsity in the "roots" of the logical functions I actually have, and that will result in hours or days of computation instead of minute(s). I think I will first give it a go using the procedure described by Jaime (tomorrow, no more time today), as I have gone through a lot of pain constructing the changes_at arrays using a fast and efficient method. --Slaunger -- View this message in context: http://numpy-discussion.10968.n7.nabble.com/Is-there-a-pure-numpy-recipe-for-this-tp37077p37089.html Sent from the Numpy-discussion mailing list archive at Nabble.com. _______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
