Re: [Numpy-discussion] Is there a pure numpy recipe for this?

Slaunger Wed, 26 Mar 2014 14:33:28 -0700

jseabold wrote
>>
>> Well, yes, if you work with the pure f_k and g_k that is true, but this
>> two-dimensional array will have 4*10^14 elements and will exhaust my
>> memory.
>>
>> That is why I have found a more efficient method for finding only the
>> much
>> fewer changes_at elements for each k, and these arrays have unequal
>> length,
>> and has to be considered for eack k (which is tolerable as long as I
>> avoid a
>> further inner loop for each k in explicit Python).
>>
>> I could implement this in C and get it done sufficiently efficient. I
>> just
>> like to make a point in demonstrating this is also doable in finite time
>> in
>> Python/numpy.
>>
> 
> If you want to attack it straight on and keep it conceptually simple,
> this looks like it would work. Fair warning, I've never done this and
> have no idea if it's actually memory and computationally efficient, so
> I'd be interested to hear from experts. I just wanted to see if it
> would work from disk. I wonder if a solution using PyTables would be
> faster.
> 
> Provided that you can chunk your data into a memmap array, then
> something you *could* do
> 
> N = 2*10**7
> chunk_size = 100000
> 
> farr1 = 'scratch/arr1'
> farr2 = 'scratch/arr2'
> 
> arr1 = np.memmap(farr1, dtype='uint8', mode='w+', shape=(N, 4))
> arr2 = np.memmap(farr2, dtype='uint8', mode='w+', shape=(N, 4))
> 
> for i in xrange(0, N, chunk_size):
>     arr1[i:i+chunk_size] = np.random.randint(2, size=(chunk_size,
> 4)).astype(np.uint8)
>     arr2[i:i+chunk_size] = np.random.randint(2, size=(chunk_size,
> 4)).astype(np.uint8)
> 
> del arr1
> del arr2
> 
> arr1 = np.memmap(farr1, mode='r', dtype='uint8', shape=(N,4))
> arr2 = np.memmap(farr2, mode='r', dtype='uint8', shape=(N,4))
> 
> 
> equal = np.logical_and(arr1[:chunk_size],
>                        arr2[:chunk_size]).sum(0)
> 
> for i in xrange(chunk_size, N, chunk_size):
>     equal += np.logical_and(arr1[i:i+chunk_size],
>                             arr2[i:i+chunk_size]).sum(0)
> 
> Skipper


Thanks for the proposal Skipper, I have used memmap before, and this may
work, but still the number of elementary and operations needed (although
hidden under the hood of chunked logical_and) will be about a factor of 1000
larger than what is actually needed due to the sparsity in the "roots" of
the logical functions I actually have, and that will result in hours or days
of computation instead of minute(s).

I think I will first give it a go using the procedure described by Jaime
(tomorrow, no more time today), as I have gone through a lot of pain
constructing the changes_at arrays using a fast and efficient method.

--Slaunger




--
View this message in context: 
http://numpy-discussion.10968.n7.nabble.com/Is-there-a-pure-numpy-recipe-for-this-tp37077p37089.html
Sent from the Numpy-discussion mailing list archive at Nabble.com.
_______________________________________________
NumPy-Discussion mailing list
[email protected]
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Is there a pure numpy recipe for this?

Reply via email to