On 6/18/19 10:06 AM, Marten van Kerkwijk wrote:
> Hi Allen,
>
> Thanks for the message and link! In astropy, we've been struggling with
> masking a lot, and one of the main conclusions I have reached is that
> ideally one has a more abstract `Masked` class that can take any type of
> data (including `ndarray`, of course), and behaves like that data as
> much as possible, to the extent that if, e.g., I create a
> `Masked(Quantity(..., unit), mask=...)`, the instance will have a
> `.unit` attribute and perhaps even `isinstance(..., Quantity)` will
> hold. And similarly for `Masked(Time(...), mask=...)`,
> `Masked(SkyCoord(...), mask=...)`, etc. In a way, `Masked` would be a
> kind of Mixin-class that just tracks a mask attribute.
>
> This may be too much to ask from the initializer, but, if so, it still
> seems most useful if it is made as easy as possible to do, say, `class
> MaskedQuantity(Masked, Quantity): <very few overrides>`.
Currently MaskedArray does not accept ducktypes as underlying arrays,
but I think it shouldn't be too hard to modify it to do so. Good idea!
I already partly navigated this mixin-issue in the
"MaskedArrayCollection" class, which essentially does
ArrayCollection(MaskedArray(array)), and only takes about 30 lines of
boilerplate. That's the backwards encapsulation order from what you want
though.
> Even if this impossible, I think it is conceptually useful to think
> about what the masking class should do. My sense is that, e.g., it
> should not attempt to decide when an operation succeeds or not, but just
> "or together" input masks for regular, multiple-input functions, and let
> the underlying arrays skip elements for reductions by using `where`
> (hey, I did implement that for a reason... ;-). In particular, it
> suggests one should not have things like domains and all that (I never
> understood why `MaskedArray` did that). If one wants more, the class
> should provide a method that updates the mask (a sensible default might
> be `mask |= ~np.isfinite(result)` - here, the class being masked should
> logically support ufuncs and functions, so it can decide what "isfinite"
> means).
I agree it would be nice to remove domains. It would make life easier,
and I could remove a lot of twiddly code! I kept it in for now to
minimize the behavior changes from the old MaskedArray.
> In any case, I would think that a basic truth should be that everything
> has a mask with a shape consistent with the data, so
> 1. Each complex numbers has just one mask, and setting `a.imag` with a
> masked array should definitely propagate the mask.
> 2. For a masked array with structured dtype, I'd similarly say that the
> default is for a mask to have the same shape as the array. But that
> something like your collection makes sense for the case where one wants
> to mask items in a structure.
Agreed that we should have a single bool per complex or structured
element, and the mask shape is the same as the array shape. That's how I
implemented it. But there is still a problem with complex.imag assignment:
>>> a = MaskedArray([1j, 2, X])
>>> i = a.imag
>>> i[:] = MaskedArray([1, X, 1])
If we make the last line copy the mask to the original array, what
should the real part of a[2] be? Conversely, if we don't copy the mask,
what should the imag part of a[1] be? It seems like we might "want" the
masks to be OR'd instead, but then should i[2] be masked after we just
set it to 1?
> All the best,
>
> Marten
>
> p.s. I started trying to implement the above "Mixin" class; will try to
> clean that up a bit so that at least it uses `where` and push it up.
I played with "where", but didn't include it since 1.17 is not released.
To avoid duplication of effort, I've attached a diff of what I tried. I
actually get a slight slowdown of about 10% by using where...
If you make progress with the mixin, a push is welcome. I imagine a
problem is going to be that np.isscalar doesn't work to detect duck scalars.
Cheers,
Allan
> On Mon, Jun 17, 2019 at 6:43 PM Allan Haldane <[email protected]
> <mailto:[email protected]>> wrote:
>
> Hi all,
>
> Chuck suggested we think about a MaskedArray replacement for 1.18.
>
> A few months ago I did some work on a MaskedArray replacement using
> `__array_function__`, which I got mostly working. It seems like a good
> time to bring it up for discussion now. See it at:
>
> https://github.com/ahaldane/ndarray_ducktypes
>
> It should be very usable, it has docs you can read, and it passes a
> pytest-suite with 1024 tests adapted from numpy's MaskedArray tests.
> What is missing? It needs even more tests for new functionality, and a
> couple numpy-API functions are missing, in particular `np.median`,
> `np.percentile`, `np.cov`, and `np.corrcoef`. I'm sure other devs could
> also find many things to improve too.
>
> Besides fixing many little annoyances from MaskedArray, and simplifying
> the logic by always storing the mask in full, it also has new features.
> For instance it allows the use of a "X" variable to mark masked
> locations during array construction, and I solve the issue of how to
> mask individual fields of a structured array differently.
>
> At this point I would by happy to get some feedback on the design and
> what seems good or bad. If it seems like a good start, I'd be happy to
> move it into a numpy repo of some sort for further collaboration &
> discussion, and maybe into 1.18. At the least I hope it can serve as a
> design study of what we could do.
>
>
>
>
>
> Let me also drop here two more interesting detailed issues:
>
> First, the issue of what to do about .real and .imag of complex arrays,
> and similarly about field-assignment of structured arrays. The problem
> is that we have a single mask bool per element of a complex array, but
> what if someone does `arr.imag = MaskedArray([1,X,1])`? How should the
> mask of the original array change? Should we make .real and .imag
> readonly?
>
> Second, a more general issue of how to ducktype scalars when using
> `__array_function__` which I think many ducktype implementors will have
> to face. For MaskedArray, I created an associated "MaskedScalar" type.
> However, MaskedScalar has to behave differently from normal numpy
> scalars in a number of ways: It is not part of the numpy scalar
> hierarchy, it fails checks `isinstance(var, np.floating)`, and
> np.isscalar returns false. Numpy scalar types cannot be subclassed. We
> have discussed before the need to have distinction between 0d-arrays and
> scalars, so we shouldn't just use a 0d (in fact, this makes printing
> very difficult). This leads me to think that in future dtype-overhaul
> plans, we should consider creating a subclassable `np.scalar` base type
> to wrap all numpy scalar variables, and code like `isinstance(var,
> np.floating)` should be replaced by `isinstance(var.dtype.type,
> np.floating)` or similar. That is, the numeric dtype of the scalar is no
> longer encoded in `type(var)` but in `var.dtype`: The fact that the
> variable is a numpy scalar is decoupled from its numeric dtype.
>
> This is useful because there are many "associated" properties of scalars
> in common with arrays which have nothing to do with the dtype, which
> ducktype implementors want to touch. I imagine this will come up a lot:
> In that repo I also have an "ArrayCollection" ducktype which required a
> "CollectionScalar" scalar, and similarly I imagine people implementing
> units want the units attached to the scalar, independently of the dtype.
>
> Cheers,
> Allan
>
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> [email protected] <mailto:[email protected]>
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
diff --git a/MaskedArray.py b/MaskedArray.py
index 2df12b4..272699f 100755
--- a/MaskedArray.py
+++ b/MaskedArray.py
@@ -904,12 +904,23 @@ class _Masked_BinOp(_Masked_UFunc):
if isinstance(initial, (MaskedScalar, MaskedX)):
raise ValueError("initial should not be masked")
- if not np.isscalar(da):
- da[ma] = self.reduce_fill(da.dtype)
- # if da is a scalar, we get correct result no matter fill
+ if 1: # two different implementations, investigate performance
+ wheremask = ~ma
+ if 'where' in kwargs:
+ wheremask |= kwargs['where']
+ kwargs['where'] = wheremask
+ if 'initial' not in kwargs:
+ kwargs['initial'] = self.reduce_fill(da.dtype)
+
+ result = self.f.reduce(da, **kwargs)
+ m = np.logical_and.reduce(ma, **mkwargs)
+ else:
+ if not np.isscalar(da):
+ da[ma] = self.reduce_fill(da.dtype)
+ # if da is a scalar, we get correct result no matter fill
- result = self.f.reduce(da, **kwargs)
- m = np.logical_and.reduce(ma, **mkwargs)
+ result = self.f.reduce(da, **kwargs)
+ m = np.logical_and.reduce(ma, **mkwargs)
## Code that might be used to support domained ufuncs. WIP
#with np.errstate(divide='ignore', invalid='ignore'):
_______________________________________________
NumPy-Discussion mailing list
[email protected]
https://mail.python.org/mailman/listinfo/numpy-discussion