Re: [Numpy-discussion] new MaskedArray class

Allan Haldane Thu, 20 Jun 2019 09:44:44 -0700

On 6/19/19 10:19 PM, Marten van Kerkwijk wrote:
> Hi Allan,
> 
> This is very impressive! I could get the tests that I wrote for my class
> pass with yours using Quantity with what I would consider very minimal
> changes. I only could not find a good way to unmask data (I like the
> idea of setting the mask on some elements via `ma[item] = X`); is this
> on purpose?


Yes, I want to make it difficult for the user to access the garbage
values under the mask, which are often clobbered values. The only way to
"remove" a masked value is by replacing it with a new non-masked value.


> Anyway, it would seem easily at the point where I should comment on your
> repository rather than in the mailing list!

To make further progress on this encapsulation idea I need a more
complete ducktype to pass into MaskedArray to test, so that's what I'll
work on next, when I have time. I'll either try to finish my
ArrayCollection type, or try making a simple NDunit ducktype
piggybacking on astropy's Unit.

Best,
Allan


> 
> All the best,
> 
> Marten
> 
> 
> On Wed, Jun 19, 2019 at 5:45 PM Allan Haldane <[email protected]
> <mailto:[email protected]>> wrote:
> 
>     On 6/18/19 2:04 PM, Marten van Kerkwijk wrote:
>     >
>     >
>     > On Tue, Jun 18, 2019 at 12:55 PM Allan Haldane
>     <[email protected] <mailto:[email protected]>
>     > <mailto:[email protected] <mailto:[email protected]>>>
>     wrote:
>     > <snip>
>     >
>     >     > This may be too much to ask from the initializer, but, if
>     so, it still
>     >     > seems most useful if it is made as easy as possible to do,
>     say, `class
>     >     > MaskedQuantity(Masked, Quantity): <very few overrides>`.
>     >
>     >     Currently MaskedArray does not accept ducktypes as underlying
>     arrays,
>     >     but I think it shouldn't be too hard to modify it to do so.
>     Good idea!
>     >
>     >
>     > Looking back at my trial, I see that I also never got to duck arrays -
>     > only ndarray subclasses - though I tried to make the code as
>     agnostic as
>     > possible.
>     >
>     > (Trial at
>     >
>     
> https://github.com/astropy/astropy/compare/master...mhvk:utils-masked-class?expand=1)
>     >
>     >     I already partly navigated this mixin-issue in the
>     >     "MaskedArrayCollection" class, which essentially does
>     >     ArrayCollection(MaskedArray(array)), and only takes about 30
>     lines of
>     >     boilerplate. That's the backwards encapsulation order from
>     what you want
>     >     though.
>     >
>     >
>     > Yes, indeed, from a quick trial `MaskedArray(np.arange(3.) * u.m,
>     > mask=[True, False, False])` does indeed not have a `.unit` attribute
>     > (and cannot represent itself...); I'm not at all sure that my
>     method of
>     > just creating a mixed class is anything but a recipe for disaster,
>     though!
> 
>     Based on your suggestion I worked on this a little today, and now my
>     MaskedArray more easily encapsulates both ducktypes and ndarray
>     subclasses (pushed to repo). Here's an example I got working with masked
>     units using unyt:
> 
>     [1]: from MaskedArray import X, MaskedArray, MaskedScalar
> 
>     [2]: from unyt import m, km
> 
>     [3]: import numpy as np
> 
>     [4]: uarr = MaskedArray([1., 2., 3.]*km, mask=[0,1,0])
> 
>     [5]: uarr
> 
>     MaskedArray([1., X , 3.])
>     [6]: uarr + 1*m
> 
>     MaskedArray([1.001, X    , 3.001])
>     [7]: uarr.filled()
> 
>     unyt_array([1., 0., 3.], 'km')
>     [8]: np.concatenate([uarr, 2*uarr]).filled()
>     unyt_array([1., 0., 3., 2., 0., 6.], '(dimensionless)')
> 
>     The catch is the ducktype/subclass has to rigorously follow numpy's
>     indexing rules, including distinguishing 0d arrays from scalars. For now
>     only I used unyt in the example above since it happens to be less strict
>      about dimensionless operations than astropy.units which trips up my
>     repr code. (see below for example with astropy.units). Note in the last
>     line I lost the dimensions, but that is because unyt does not handle
>     np.concatenate. To get that to work we need a true ducktype for units.
> 
>     The example above doesn't expose the ".units" attribute outside the
>     MaskedArray, and it doesn't print the units in the repr. But you can
>     access them using "filled".
> 
>     While I could make MaskedArray forward unknown attribute accesses to the
>     encapsulated array, that seems a bit dangerous/bug-prone at first
>     glance, so probably I want to require the user to make a MaskedArray
>     subclass to do so. I've just started playing with that (probably buggy),
>     and Ive attached subclass examples for astropy.unit and unyt, with some
>     example output below.
> 
>     Cheers,
>     Allan
> 
> 
> 
>     Example using the attached astropy unit subclass:
> 
>         >>> from astropy.units import m, km, s
>         >>> uarr = MaskedQ(np.ones(3), units=km, mask=[0,1,0])
>         >>> uarr
>         MaskedQ([1., X , 1.], units=km)
>         >>> uarr.units
>         km
>         >>> uarr + (1*m)
>         MaskedQ([1.001, X    , 1.001], units=km)
>         >>> uarr/(1*s)
>         MaskedQ([1., X , 1.], units=km / s)
>         >>> (uarr*(1*m))[1:]
>         MaskedQ([X , 1.], units=km m)
>         >>> np.add.outer(uarr, uarr)
>         MaskedQ([[2., X , 2.],
>                  [X , X , X ],
>                  [2., X , 2.]], units=km)
>         >>> print(uarr)
>         [1. X  1.] km m
> 
>     Cheers,
>     Allan
> 
> 
>     >     > Even if this impossible, I think it is conceptually useful
>     to think
>     >     > about what the masking class should do. My sense is that,
>     e.g., it
>     >     > should not attempt to decide when an operation succeeds or not,
>     >     but just
>     >     > "or together" input masks for regular, multiple-input functions,
>     >     and let
>     >     > the underlying arrays skip elements for reductions by using
>     `where`
>     >     > (hey, I did implement that for a reason... ;-). In
>     particular, it
>     >     > suggests one should not have things like domains and all
>     that (I never
>     >     > understood why `MaskedArray` did that). If one wants more,
>     the class
>     >     > should provide a method that updates the mask (a sensible
>     default
>     >     might
>     >     > be `mask |= ~np.isfinite(result)` - here, the class being masked
>     >     should
>     >     > logically support ufuncs and functions, so it can decide what
>     >     "isfinite"
>     >     > means).
>     >
>     >     I agree it would be nice to remove domains. It would make life
>     easier,
>     >     and I could remove a lot of twiddly code! I kept it in for now to
>     >     minimize the behavior changes from the old MaskedArray.
>     >
>     >
>     > That makes sense. Could be separated out to a backwards-compatibility
>     > class later.
>     >
>     >
>     >     > In any case, I would think that a basic truth should be that
>     >     everything
>     >     > has a mask with a shape consistent with the data, so
>     >     > 1. Each complex numbers has just one mask, and setting
>     `a.imag` with a
>     >     > masked array should definitely propagate the mask.
>     >     > 2. For a masked array with structured dtype, I'd similarly say
>     >     that the
>     >     > default is for a mask to have the same shape as the array.
>     But that
>     >     > something like your collection makes sense for the case
>     where one
>     >     wants
>     >     > to mask items in a structure.
>     >
>     >     Agreed that we should have a single bool per complex or structured
>     >     element, and the mask shape is the same as the array shape.
>     That's how I
>     >     implemented it. But there is still a problem with complex.imag
>     >     assignment:
>     >
>     >         >>> a = MaskedArray([1j, 2, X])
>     >         >>> i = a.imag
>     >         >>> i[:] = MaskedArray([1, X, 1])
>     >
>     >     If we make the last line copy the mask to the original array, what
>     >     should the real part of a[2] be? Conversely, if we don't copy
>     the mask,
>     >     what should the imag part of a[1] be? It seems like we might
>     "want" the
>     >     masks to be OR'd instead, but then should i[2] be masked after
>     we just
>     >     set it to 1?
>     >
>     > Ah, I see the issue now... Easiest to implement and closest in analogy
>     > to a regular view would be to just let it unmask a[2] (with
>     whatever is
>     > in real; user beware!).
>     >
>     > Perhaps better would be to special-case such that `imag` returns a
>     > read-only view of the mask. Making `imag` itself read-only would
>     prevent
>     > possibly reasonable things like `i[np.isclose(i, 0)] = 0` - but
>     there is
>     > no reason this should update the mask.
>     >
>     > Still, neither is really satisfactory...
>     >  
>     >
>     >
>     >     > p.s. I started trying to implement the above "Mixin" class; will
>     >     try to
>     >     > clean that up a bit so that at least it uses `where` and
>     push it up.
>     >
>     >     I played with "where", but didn't include it since 1.17 is not
>     released.
>     >     To avoid duplication of effort, I've attached a diff of what I
>     tried. I
>     >     actually get a slight slowdown of about 10% by using where...
>     >
>     >
>     > Your implementation is indeed quite similar to what I got in
>     > __array_ufunc__ (though one should "&" the where with ~mask).
>     >
>     > I think the main benefit is not to presume that whatever is underneath
>     > understands 0 or 1, i.e., avoid filling.
>     >
>     >
>     >     If you make progress with the mixin, a push is welcome. I
>     imagine a
>     >     problem is going to be that np.isscalar doesn't work to detect
>     duck
>     >     scalars.
>     >
>     > I fear that in my attempts I've simply decided that only array scalars
>     > exist...
>     >
>     > -- Marten
>     >
>     > _______________________________________________
>     > NumPy-Discussion mailing list
>     > [email protected] <mailto:[email protected]>
>     > https://mail.python.org/mailman/listinfo/numpy-discussion
>     >
> 
>     _______________________________________________
>     NumPy-Discussion mailing list
>     [email protected] <mailto:[email protected]>
>     https://mail.python.org/mailman/listinfo/numpy-discussion
> 
> 
> _______________________________________________
> NumPy-Discussion mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/numpy-discussion
> 

_______________________________________________
NumPy-Discussion mailing list
[email protected]
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] new MaskedArray class

Reply via email to