Eli Bressert wrote:
Hi,
I'm using masked arrays to compute large-scale standard deviation,
multiplication, gaussian, and weighted averages. At first I thought
using the masked arrays would be a great way to sidestep looping
(which it is), but it's still slower than expected. Here's a snippet
of the code that I'm using it for.
[...]
# Like the spatial_weight section, this takes about 20 seconds
W = spatial_weight / Rho2
# Takes less than one second.
Ave = np.average(av_good,axis=1,weights=W)
Any ideas on why it would take such a long time for processing?
A part of the slowdown is what looks to me like unnecessary copying in
_MaskedBinaryOperation.__call__. It is using getdata, which applies
numpy.array to its input, forcing a copy. I think the copy is actually
unintentional, in at least one sense, and possibly two: first, because
the default argument of getattr is always evaluated, even if it is not
needed; and second, because the call to np.array is used where
np.asarray or equivalent would suffice.
The first file attached below shows the kernprof in the case of
multiplying two masked arrays, shape (100000,50), with no masked
elements; 2/3 of the time is taken copying the data.
Now, if there are actually masked elements in the arrays, it gets much
worse: see the second attachment. The total time has increased by more
than a factor of 3, and the culprit is numpy.which(), a very slow
function. It looks to me like it is doing nothing useful at all; the
numpy binary operation is still being executed for all elements,
regardless of mask, contrary to the intention implied by the comment in
the code.
The third attached file has a patch that fixes the getdata problem and
eliminates the which().
With this patch applied we get the profile in the 4th file, to be
compared to the second profile. Much better. I am pretty sure it could
still be sped up quite a bit, though. It looks like the masks are
essentially being calculated twice for no good reason, but I don't
completely understand all the mask considerations, so at this point I am
not trying to fix that problem.
Eric
Especially the spatial_weight and W variables? Would there be a faster
way to do this? Or is there a way that numpy.std can process ignore
nan's when processing?
Thanks,
Eli Bressert
_______________________________________________
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
efir...@manini:~/test$ kernprof.py -lv profile_ma_mult.py
Wrote profile results to profile_ma_mult.py.lprof
Timer unit: 1e-06 s
File: /usr/local/lib/python2.6/dist-packages/numpy/ma/core.py
Function: __call__ at line 689
Total time: 0.189224 s
Line # Hits Time Per Hit % Time Line Contents
==============================================================
689 @profile
690 def __call__ (self, a, b,
*args, **kwargs):
691 "Execute the call
behavior."
692 1 52 52.0 0.0 m =
mask_or(getmask(a), getmask(b), shrink=False)
693 1 122614 122614.0 64.8 (da, db) =
(getdata(a), getdata(b))
694 # Easy case: there's
no mask...
695 1 7 7.0 0.0 if m is nomask:
696 1 66282 66282.0 35.0 result =
self.f(da, db, *args, **kwargs)
697 # There are some
masked elements: run only on the unmasked
698 else:
699 result =
np.where(m, da, self.f(da, db, *args, **kwargs))
700 # Transforms to a
(subclass of) MaskedArray if we don't have a scalar
701 1 10 10.0 0.0 if result.shape:
702 1 124 124.0 0.1 result =
result.view(get_masked_subclass(a, b))
703 # If we have a
mask, make sure it's broadcasted properly
704 1 57 57.0 0.0 if m.any():
705 result._mask =
mask_or(getmaskarray(a), getmaskarray(b))
706 # If some initial
masks where not shrunk, don't shrink the result
707 1 4 4.0 0.0 elif m.shape:
708 result._mask =
make_mask_none(result.shape, result.dtype)
709 1 4 4.0 0.0 if isinstance(a,
MaskedArray):
710 1 33 33.0 0.0
result._update_from(a)
711 1 4 4.0 0.0 if isinstance(b,
MaskedArray):
712 1 30 30.0 0.0
result._update_from(b)
713 # ... or return masked
if we have a scalar and the common mask is True
714 elif m:
715 return masked
716 1 3 3.0 0.0 return result
efir...@manini:~/test$ kernprof.py -lv profile_ma_mult.py
Wrote profile results to profile_ma_mult.py.lprof
Timer unit: 1e-06 s
File: /usr/local/lib/python2.6/dist-packages/numpy/ma/core.py
Function: __call__ at line 689
Total time: 0.649928 s
Line # Hits Time Per Hit % Time Line Contents
==============================================================
689 @profile
690 def __call__ (self, a, b,
*args, **kwargs):
691 "Execute the call
behavior."
692 1 30012 30012.0 4.6 m =
mask_or(getmask(a), getmask(b), shrink=False)
693 1 120784 120784.0 18.6 (da, db) =
(getdata(a), getdata(b))
694 # Easy case: there's
no mask...
695 1 7 7.0 0.0 if m is nomask:
696 result =
self.f(da, db, *args, **kwargs)
697 # There are some
masked elements: run only on the unmasked
698 else:
699 1 418399 418399.0 64.4 result =
np.where(m, da, self.f(da, db, *args, **kwargs))
700 # Transforms to a
(subclass of) MaskedArray if we don't have a scalar
701 1 10 10.0 0.0 if result.shape:
702 1 111 111.0 0.0 result =
result.view(get_masked_subclass(a, b))
703 # If we have a
mask, make sure it's broadcasted properly
704 1 25419 25419.0 3.9 if m.any():
705 1 55096 55096.0 8.5 result._mask =
mask_or(getmaskarray(a), getmaskarray(b))
706 # If some initial
masks where not shrunk, don't shrink the result
707 elif m.shape:
708 result._mask =
make_mask_none(result.shape, result.dtype)
709 1 7 7.0 0.0 if isinstance(a,
MaskedArray):
710 1 46 46.0 0.0
result._update_from(a)
711 1 4 4.0 0.0 if isinstance(b,
MaskedArray):
712 1 30 30.0 0.0
result._update_from(b)
713 # ... or return masked
if we have a scalar and the common mask is True
714 elif m:
715 return masked
716 1 3 3.0 0.0 return result
Index: numpy/ma/core.py
===================================================================
--- numpy/ma/core.py (revision 6875)
+++ numpy/ma/core.py (working copy)
@@ -3,7 +3,7 @@
numpy.ma : a package to handle missing or invalid values.
This package was initially written for numarray by Paul F. Dubois
-at Lawrence Livermore National Laboratory.
+at Lawrence Livermore National Laboratory.
In 2006, the package was completely rewritten by Pierre Gerard-Marchant
(University of Georgia) to make the MaskedArray class a subclass of ndarray,
and to improve support of structured arrays.
@@ -12,7 +12,7 @@
Copyright 1999, 2000, 2001 Regents of the University of California.
Released for unlimited redistribution.
* Adapted for numpy_core 2005 by Travis Oliphant and (mainly) Paul Dubois.
-* Subclassing of the base ndarray 2006 by Pierre Gerard-Marchant
+* Subclassing of the base ndarray 2006 by Pierre Gerard-Marchant
(pgmdevlist_AT_gmail_DOT_com)
* Improvements suggested by Reggie Dugard (reggie_AT_merfinllc_DOT_com)
@@ -482,7 +482,10 @@
[3, 4]])
"""
- data = getattr(a, '_data', np.array(a, subok=subok))
+ try:
+ data = getattr(a, '_data')
+ except AttributeError:
+ data = np.array(a, copy=False, subok=subok)
if not subok:
return data.view(ndarray)
return data
@@ -690,12 +693,7 @@
"Execute the call behavior."
m = mask_or(getmask(a), getmask(b), shrink=False)
(da, db) = (getdata(a), getdata(b))
- # Easy case: there's no mask...
- if m is nomask:
- result = self.f(da, db, *args, **kwargs)
- # There are some masked elements: run only on the unmasked
- else:
- result = np.where(m, da, self.f(da, db, *args, **kwargs))
+ result = self.f(da, db, *args, **kwargs)
# Transforms to a (subclass of) MaskedArray if we don't have a scalar
if result.shape:
result = result.view(get_masked_subclass(a, b))
@@ -2838,7 +2836,7 @@
baseclass = property(fget= lambda self:self._baseclass,
doc="Class of the underlying data (read-only).")
-
+
def _get_data(self):
"""Return the current data, as a view of the original
underlying data.
Wrote profile results to profile_ma_mult.py.lprof
Timer unit: 1e-06 s
File: /usr/local/lib/python2.6/dist-packages/numpy/ma/core.py
Function: __call__ at line 692
Total time: 0.176606 s
Line # Hits Time Per Hit % Time Line Contents
==============================================================
692 @profile
693 def __call__ (self, a, b,
*args, **kwargs):
694 "Execute the call
behavior."
695 1 30007 30007.0 17.0 m =
mask_or(getmask(a), getmask(b), shrink=False)
696 1 34 34.0 0.0 (da, db) =
(getdata(a), getdata(b))
697 1 65855 65855.0 37.3 result = self.f(da,
db, *args, **kwargs)
698 # Transforms to a
(subclass of) MaskedArray if we don't have a scalar
699 1 11 11.0 0.0 if result.shape:
700 1 136 136.0 0.1 result =
result.view(get_masked_subclass(a, b))
701 # If we have a
mask, make sure it's broadcasted properly
702 1 25425 25425.0 14.4 if m.any():
703 1 55048 55048.0 31.2 result._mask =
mask_or(getmaskarray(a), getmaskarray(b))
704 # If some initial
masks where not shrunk, don't shrink the result
705 elif m.shape:
706 result._mask =
make_mask_none(result.shape, result.dtype)
707 1 7 7.0 0.0 if isinstance(a,
MaskedArray):
708 1 45 45.0 0.0
result._update_from(a)
709 1 4 4.0 0.0 if isinstance(b,
MaskedArray):
710 1 31 31.0 0.0
result._update_from(b)
711 # ... or return masked
if we have a scalar and the common mask is True
712 elif m:
713 return masked
714 1 3 3.0 0.0 return result
_______________________________________________
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion