Re: [Numpy-discussion] A crazy masked-array thought

2012-04-28 Thread Richard Hattersley
On 27 April 2012 17:42, Travis Oliphant tra...@continuum.io wrote:


 1) There is a lot of code out there that does not know anything about
 masks and is not used to checking for masks.It enlarges the basic
 abstraction in a way that is not backwards compatible *conceptually*.
  This smells fishy to me and I could see a lot of downstream problems from
 libraries that rely on NumPy.


That's exactly why I'd love to see plain arrays remain functionally
unchanged.

It's just a small, random sample, but here's how a few routines from NumPy
and SciPy sanitise their inputs...

numpy.trapz (aka scipy.integrate.trapz) - numpy.asanyarray
scipy.spatial.KDTree - numpy.asarray
scipy.spatial.cKDTree - numpy.ascontiguousarray
scipy.integrate.odeint - PyArray_ContiguousFromObject
scipy.interpolate.interp1d - numpy.array
scipy.interpolate.griddata - numpy.asanyarray  numpy.ascontiguousarray

So, assuming numpy.ndarray became a strict subclass of some new masked
array, it looks plausible that adding just a few checks to numpy.ndarray to
exclude the masked superclass would prevent much downstream code from
accidentally operating on masked arrays.



 2) We cannot agree on how masks should be handled and consequently don't
 have a real plan for migrating numpy.ma to use these masks.   So, we are
 just growing the API and introducing uncertainty for unclear benefit ---
 especially for the person that does not want to use masks.


I've not yet looked at how numpy.ma users could be migrated. But if we make
masked arrays a strict superclass and leave the numpy/ndarray interface and
behaviour unchanged, API growth shouldn't be an issue. End-users will be
able to completely ignore the existence of masked arrays (except for the
minority(?) for whom the ABI/re-compile issue would be relevant).


 3) Subclassing in C in Python requires that C-structures are *binary*
 compatible.This implies that all subclasses have *more* attributes than
 the superclass.   The way it is currently implemented, that means that POAs
 would have these extra pointers they don't need sitting there to satisfy
 that requirement.   From a C-struct perspective it therefore makes more
 sense for MAs to inherit from POAs.Ideally, that shouldn't drive the
 design, but it's part of the landscape in NumPy 1.X


I'd hate to see the logical class hierarchy inverted (or collapsed to a
single class) just to save a pointer or two from the struct. Now seems like
a golden opportunity to fix the relationship between masked and plain
arrays. I'm assuming (and implicitly checking that assumption with this
statement!) that there's far more code using the Python interface to NumPy,
than there is code using the C interface. So I'm urging that the logical
consistency of the Python interface (and even the C and Cython interfaces)
takes precedence over the C-struct memory saving.

I'm not sure I agree with extra pointers they don't need. If we make
plain arrays a subclass of masked arrays, aren't these pointers essential
to ensure masked array methods can continue to work on plain arrays without
requiring special code paths?


 I have some ideas about how to move forward, but I'm anxiously awaiting
 the write-up that Mark and Nathaniel are working on to inform and enhance
 those ideas.


+1

As an aside, the implication of preserving the behaviour of the
numpy/ndarray interface is that masked arrays will need a *new* interface.

For example:
 import mumpy # Yes - I know it's a terrible name! But I had to write
*something* ... sorry! ;-)
 import numpy
 a = mumpy.array(...) # makes a masked array
 b = numpy.array(...) # makes a plain array
 isinstance(a, mumpy.ndarray)
True
 isinstance(b, mumpy.ndarray)
True
 isinstance(a, numpy.ndarray)
False
 isinstance(b, numpy.ndarray)
True

Richard Hattersley
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] A crazy masked-array thought

2012-04-28 Thread Nathaniel Smith
On Sat, Apr 28, 2012 at 7:38 AM, Richard Hattersley
rhatters...@gmail.com wrote:
 So, assuming numpy.ndarray became a strict subclass of some new masked
 array, it looks plausible that adding just a few checks to numpy.ndarray to
 exclude the masked superclass would prevent much downstream code from
 accidentally operating on masked arrays.

I think the main point I was trying to make is that it's the existence
and content of these checks that matters. They don't necessarily have
any relation at all to which thing Python calls a superclass or a
subclass.

-- Nathaniel
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] A crazy masked-array thought

2012-04-28 Thread Neal Becker
Nathaniel Smith wrote:

 On Sat, Apr 28, 2012 at 7:38 AM, Richard Hattersley
 rhatters...@gmail.com wrote:
 So, assuming numpy.ndarray became a strict subclass of some new masked
 array, it looks plausible that adding just a few checks to numpy.ndarray to
 exclude the masked superclass would prevent much downstream code from
 accidentally operating on masked arrays.
 
 I think the main point I was trying to make is that it's the existence
 and content of these checks that matters. They don't necessarily have
 any relation at all to which thing Python calls a superclass or a
 subclass.
 
 -- Nathaniel

I don't agree with the argument that ma should be a superclass of ndarray.  It 
is ma that is adding features.  That makes it a subclass.  We're not talking 
mathematics here.

There is a well-known disease of OOP where everything seems to bubble up to the 
top of the class hierarchy - so that the base class becomes bloated to support 
every feature needed by subclasses.  I believe that's considered poor design.

Is there a way to support ma as a subclass of ndarray, without introducing 
overhead into ndarray?  Without having given this much real thought, I do have 
some idea.  What are the operations that we need on arrays?  The most basic are:

1. element access
2. get size (shape)

In an OO design, these would be virtual functions (or in C, pointers to 
functions).  But this would introduce unacceptable overhead.

In a generic programming design (c++ templates), we would essentially generate 
2 
copies of every function, one that operates on plain arrays, and one that 
operates on masked arrays, each using the appropriate function for element 
access, shape, etc.  This way, no uneeded overhead is introduced, (although the 
code size is increased - but this is probably of little consequence on modern 
demand-paged OS).

Following this approach, ma and ndarray don't have to have any inheritance 
relation.  OTOH, inheritance is probably useful since there are many common 
features to ma and ndarray, and a lot of code could be shared.



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] A crazy masked-array thought

2012-04-28 Thread Charles R Harris
On Sat, Apr 28, 2012 at 10:58 AM, Neal Becker ndbeck...@gmail.com wrote:

 Nathaniel Smith wrote:

  On Sat, Apr 28, 2012 at 7:38 AM, Richard Hattersley
  rhatters...@gmail.com wrote:
  So, assuming numpy.ndarray became a strict subclass of some new masked
  array, it looks plausible that adding just a few checks to
 numpy.ndarray to
  exclude the masked superclass would prevent much downstream code from
  accidentally operating on masked arrays.
 
  I think the main point I was trying to make is that it's the existence
  and content of these checks that matters. They don't necessarily have
  any relation at all to which thing Python calls a superclass or a
  subclass.
 
  -- Nathaniel

 I don't agree with the argument that ma should be a superclass of ndarray.
  It
 is ma that is adding features.  That makes it a subclass.  We're not
 talking
 mathematics here.


It isn't a subclass either. In a true subclass, anything that worked on the
base class would work equally well on a subclass *without modification*.
Basically, it's an independent class with special functions that can handle
combinations and ufuncs. Look at all the functions exported in
numpy/ma/core.py. Inheritance really isn't an concept appropriate to this
case. Pretty much all the functions are rewritten for masked arrays. Which
is one reason maintenance is a hassle, lots of things have to be maintained
in two places.

 | There is a well-known disease of OOP where everything seems to bubble up
to the

 top of the class hierarchy - so that the base class becomes bloated to
 support
 every feature needed by subclasses.  I believe that's considered poor
 design.

 Is there a way to support ma as a subclass of ndarray, without introducing
 overhead into ndarray?  Without having given this much real thought, I do
 have
 some idea.  What are the operations that we need on arrays?  The most
 basic are:

 1. element access
 2. get size (shape)

 In an OO design, these would be virtual functions (or in C, pointers to
 functions).  But this would introduce unacceptable overhead.


Sure, and you would still have two different functions of almost everything.


 In a generic programming design (c++ templates), we would essentially
 generate 2
 copies of every function, one that operates on plain arrays, and one that
 operates on masked arrays, each using the appropriate function for element
 access, shape, etc.  This way, no uneeded overhead is introduced,
 (although the
 code size is increased - but this is probably of little consequence on
 modern
 demand-paged OS).

 Following this approach, ma and ndarray don't have to have any inheritance
 relation.  OTOH, inheritance is probably useful since there are many common
 features to ma and ndarray, and a lot of code could be shared.


Not many common behaviours. Analogous behaviours, perhaps. And since
everything ends up written twice the best was to share code is to do it in
the base class.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] A crazy masked-array thought

2012-04-27 Thread Richard Hattersley
I know used a somewhat jokey tone in my original posting, but fundamentally
it was a serious question concerning a live topic. So I'm curious about the
lack of response. Has this all been covered before?

Sorry if I'm being too impatient!


On 25 April 2012 16:58, Richard Hattersley rhatters...@gmail.com wrote:

 The masked array discussions have brought up all sorts of interesting
 topics - too many to usefully list here - but there's one aspect I haven't
 spotted yet. Perhaps that's because it's flat out wrong, or crazy, or just
 too awkward to be helpful. But ...

 Shouldn't masked arrays (MA) be a superclass of the plain-old-array (POA)?

 In the library I'm working on, the introduction of MAs (via numpy.ma)
 required us to sweep through the library and make a fair few changes.
 That's not the sort of thing one would normally expect from the
 introduction of a subclass.

 Putting aside the ABI issue, would it help downstream API compatibility if
 the POA was a subclass of the MA? Code that's expecting/casting-to a POA
 might continue to work and, where appropriate, could be upgraded in their
 own time to accept MAs.

 Richard Hattersley

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] A crazy masked-array thought

2012-04-27 Thread Benjamin Root
On Fri, Apr 27, 2012 at 6:32 AM, Richard Hattersley
rhatters...@gmail.comwrote:

 I know used a somewhat jokey tone in my original posting, but
 fundamentally it was a serious question concerning a live topic. So I'm
 curious about the lack of response. Has this all been covered before?

 Sorry if I'm being too impatient!



Richard,

Actually, I am rather surprised by the lack of response as well.  Actually,
this is quite unusual and I hope it doesn't sour you for more
contributions.  We do need more crazy ideas like your, if only just to
help break out of an infinite loop in a discussion.

Your idea is interesting, but doesn't it require C++?  Or maybe you are
thinking of creating a new C type object that would contain all the new
features and hold a pointer and function interface to the original POA.
Essentially, the new type would act as a wrapper around the original
ndarray?

Cheers!
Ben Root
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] A crazy masked-array thought

2012-04-27 Thread Nathaniel Smith
On Fri, Apr 27, 2012 at 11:32 AM, Richard Hattersley
rhatters...@gmail.com wrote:
 I know used a somewhat jokey tone in my original posting, but fundamentally
 it was a serious question concerning a live topic. So I'm curious about the
 lack of response. Has this all been covered before?

 Sorry if I'm being too impatient!

That's fine, I know I did read it, but I wasn't sure what to make of
it to respond :-)

 On 25 April 2012 16:58, Richard Hattersley rhatters...@gmail.com wrote:

 The masked array discussions have brought up all sorts of interesting
 topics - too many to usefully list here - but there's one aspect I haven't
 spotted yet. Perhaps that's because it's flat out wrong, or crazy, or just
 too awkward to be helpful. But ...

 Shouldn't masked arrays (MA) be a superclass of the plain-old-array (POA)?

 In the library I'm working on, the introduction of MAs (via numpy.ma)
 required us to sweep through the library and make a fair few changes. That's
 not the sort of thing one would normally expect from the introduction of a
 subclass.

 Putting aside the ABI issue, would it help downstream API compatibility if
 the POA was a subclass of the MA? Code that's expecting/casting-to a POA
 might continue to work and, where appropriate, could be upgraded in their
 own time to accept MAs.

This makes a certain amount of sense from a traditional OO modeling
perspective, where classes are supposed to refer to sets of objects
and subclasses are subsets and superclasses are supersets. This is the
property that's needed to guarantee that if A is a subclass of B, then
any code that expects a B can also handle an A, since all A's are B's,
which is what you need if you're doing type-checking or type-based
dispatch. And indeed, from this perspective, MAs are a superclass of
POAs, because for every POA there's a equivalent MA (the one with the
mask set to all-true), but not vice-versa.

But, that model of OO doesn't have much connection to Python. In
Python's semantics, classes are almost irrelevant; they're mostly just
some convenience tools for putting together the objects you want, and
what really matters is the behavior of each object (the famous duck
typing). You can call isinstance() if you want, but it's just an
ordinary function that looks at some attributes on an object; the only
magic involved is that some of those attributes have underscores in
their name. In Python, subclassing mostly does two things: (1) it's a
quick way to define set up a class that's similar to another class
(though this is a worse idea than it looks -- you're basically doing
'from other_class import *' with all the usual tight-coupling problems
that 'import *' brings). (2) When writing Python objects at the C
level, subclassing lets you achieve memory layout compatibility (which
is important because C does *not* do duck typing), and it lets you add
new fields to a C struct.

So at this level, MAs are a subclass of POAs, because MAs have an
extra field that POAs don't...

So I don't know what to think about subclasses/superclasses here,
because they're such confusing and contradictory concepts that it's
hard to tell what the actual resulting API semantics would be.

- N
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] A crazy masked-array thought

2012-04-27 Thread Charles R Harris
On Wed, Apr 25, 2012 at 9:58 AM, Richard Hattersley
rhatters...@gmail.comwrote:

 The masked array discussions have brought up all sorts of interesting
 topics - too many to usefully list here - but there's one aspect I haven't
 spotted yet. Perhaps that's because it's flat out wrong, or crazy, or just
 too awkward to be helpful. But ...

 Shouldn't masked arrays (MA) be a superclass of the plain-old-array (POA)?

 In the library I'm working on, the introduction of MAs (via numpy.ma)
 required us to sweep through the library and make a fair few changes.
 That's not the sort of thing one would normally expect from the
 introduction of a subclass.

 Putting aside the ABI issue, would it help downstream API compatibility if
 the POA was a subclass of the MA? Code that's expecting/casting-to a POA
 might continue to work and, where appropriate, could be upgraded in their
 own time to accept MAs.


That's a version of the idea that all arrays have masks, just some of them
have missing masks. That construction was mentioned in the thread but I
can see how one might have missed it. I think it is the right way to do
things. However, current libraries and such will still need to do some work
in order to not do the wrong thing when a real mask was present. For
instance, check and raise an error if they can't deal with it.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] A crazy masked-array thought

2012-04-27 Thread Charles R Harris
On Fri, Apr 27, 2012 at 8:15 AM, Charles R Harris charlesr.har...@gmail.com
 wrote:



 On Wed, Apr 25, 2012 at 9:58 AM, Richard Hattersley rhatters...@gmail.com
  wrote:

 The masked array discussions have brought up all sorts of interesting
 topics - too many to usefully list here - but there's one aspect I haven't
 spotted yet. Perhaps that's because it's flat out wrong, or crazy, or just
 too awkward to be helpful. But ...

 Shouldn't masked arrays (MA) be a superclass of the plain-old-array (POA)?

 In the library I'm working on, the introduction of MAs (via numpy.ma)
 required us to sweep through the library and make a fair few changes.
 That's not the sort of thing one would normally expect from the
 introduction of a subclass.

 Putting aside the ABI issue, would it help downstream API compatibility
 if the POA was a subclass of the MA? Code that's expecting/casting-to a POA
 might continue to work and, where appropriate, could be upgraded in their
 own time to accept MAs.


 That's a version of the idea that all arrays have masks, just some of them
 have missing masks. That construction was mentioned in the thread but I
 can see how one might have missed it. I think it is the right way to do
 things. However, current libraries and such will still need to do some work
 in order to not do the wrong thing when a real mask was present. For
 instance, check and raise an error if they can't deal with it.


To expand a bit more, this is precisely why the current work on making
masks part of ndarray rather than a subclass was undertaken. There is a
flag that says whether or not the array is masked, but you will still need
to check that flag to see if you are working with an unmasked instance of
ndarray. At the moment the masked version isn't quite completely fused with
ndarrays-classic since the maskedness needs to be specified in the
constructors and such, but what you suggest is actually what we are working
towards.

No matter what is done, current functions and libraries that want to use
masks are going to have to deal with the existence of both masked and
unmasked arrays since the existence of a mask can't be ignored without
risking wrong results.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] A crazy masked-array thought

2012-04-27 Thread josef . pktd
On Fri, Apr 27, 2012 at 10:33 AM, Charles R Harris
charlesr.har...@gmail.com wrote:


 On Fri, Apr 27, 2012 at 8:15 AM, Charles R Harris
 charlesr.har...@gmail.com wrote:



 On Wed, Apr 25, 2012 at 9:58 AM, Richard Hattersley
 rhatters...@gmail.com wrote:

 The masked array discussions have brought up all sorts of interesting
 topics - too many to usefully list here - but there's one aspect I haven't
 spotted yet. Perhaps that's because it's flat out wrong, or crazy, or just
 too awkward to be helpful. But ...

 Shouldn't masked arrays (MA) be a superclass of the plain-old-array
 (POA)?

 In the library I'm working on, the introduction of MAs (via numpy.ma)
 required us to sweep through the library and make a fair few changes. That's
 not the sort of thing one would normally expect from the introduction of a
 subclass.

 Putting aside the ABI issue, would it help downstream API compatibility
 if the POA was a subclass of the MA? Code that's expecting/casting-to a POA
 might continue to work and, where appropriate, could be upgraded in their
 own time to accept MAs.


 That's a version of the idea that all arrays have masks, just some of them
 have missing masks. That construction was mentioned in the thread but I
 can see how one might have missed it. I think it is the right way to do
 things. However, current libraries and such will still need to do some work
 in order to not do the wrong thing when a real mask was present. For
 instance, check and raise an error if they can't deal with it.


 To expand a bit more, this is precisely why the current work on making masks
 part of ndarray rather than a subclass was undertaken. There is a flag that
 says whether or not the array is masked, but you will still need to check
 that flag to see if you are working with an unmasked instance of ndarray. At
 the moment the masked version isn't quite completely fused with
 ndarrays-classic since the maskedness needs to be specified in the
 constructors and such, but what you suggest is actually what we are working
 towards.

 No matter what is done, current functions and libraries that want to use
 masks are going to have to deal with the existence of both masked and
 unmasked arrays since the existence of a mask can't be ignored without
 risking wrong results.

(In case it's not the wrong thread)

If every ndarray has this maskflag, then it is easy to adjust other
library code.

if myarr.maskflag is not None: raise SorryException

What is expensive is having to do np.isnan(myarr) or
np.isfinite(myarr) everywhere.
https://github.com/scipy/scipy/pull/48

As a concept I like the idea, masked arrays are the general class with
generic defaults, clean arrays are a subclass where some methods are
overwritten with faster implementations.

Josef


 Chuck

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] A crazy masked-array thought

2012-04-27 Thread Richard Hattersley
Hi all,

Thanks for all your responses and for your patience with a newcomer. Don't
worry - I'm not going to give up yet. It's all just part of my learning the
ropes.

On 27 April 2012 14:05, Benjamin Root ben.r...@ou.edu wrote:

 snipYour idea is interesting, but doesn't it require C++?  Or maybe you
 are thinking of creating a new C type object that would contain all the new
 features and hold a pointer and function interface to the original POA.
 Essentially, the new type would act as a wrapper around the original
 ndarray?/snip

When talking about subclasses I'm just talking about the end-user
experience within Python. In other words, I'm starting from issubclass(POA,
MA) == True, and trying to figure out what the Python API implications
would be.


On 27 April 2012 14:55, Nathaniel Smith n...@pobox.com wrote:

 On Fri, Apr 27, 2012 at 11:32 AM, Richard Hattersley
 rhatters...@gmail.com wrote:
  I know used a somewhat jokey tone in my original posting, but
 fundamentally
  it was a serious question concerning a live topic. So I'm curious about
 the
  lack of response. Has this all been covered before?
 
  Sorry if I'm being too impatient!

 That's fine, I know I did read it, but I wasn't sure what to make of
 it to respond :-)

  On 25 April 2012 16:58, Richard Hattersley rhatters...@gmail.com
 wrote:
 
  The masked array discussions have brought up all sorts of interesting
  topics - too many to usefully list here - but there's one aspect I
 haven't
  spotted yet. Perhaps that's because it's flat out wrong, or crazy, or
 just
  too awkward to be helpful. But ...
 
  Shouldn't masked arrays (MA) be a superclass of the plain-old-array
 (POA)?
 
  In the library I'm working on, the introduction of MAs (via numpy.ma)
  required us to sweep through the library and make a fair few changes.
 That's
  not the sort of thing one would normally expect from the introduction
 of a
  subclass.
 
  Putting aside the ABI issue, would it help downstream API compatibility
 if
  the POA was a subclass of the MA? Code that's expecting/casting-to a POA
  might continue to work and, where appropriate, could be upgraded in
 their
  own time to accept MAs.

 This makes a certain amount of sense from a traditional OO modeling
 perspective, where classes are supposed to refer to sets of objects
 and subclasses are subsets and superclasses are supersets. This is the
 property that's needed to guarantee that if A is a subclass of B, then
 any code that expects a B can also handle an A, since all A's are B's,
 which is what you need if you're doing type-checking or type-based
 dispatch. And indeed, from this perspective, MAs are a superclass of
 POAs, because for every POA there's a equivalent MA (the one with the
 mask set to all-true), but not vice-versa.

 But, that model of OO doesn't have much connection to Python. In
 Python's semantics, classes are almost irrelevant; they're mostly just
 some convenience tools for putting together the objects you want, and
 what really matters is the behavior of each object (the famous duck
 typing). You can call isinstance() if you want, but it's just an
 ordinary function that looks at some attributes on an object; the only
 magic involved is that some of those attributes have underscores in
 their name. In Python, subclassing mostly does two things: (1) it's a
 quick way to define set up a class that's similar to another class
 (though this is a worse idea than it looks -- you're basically doing
 'from other_class import *' with all the usual tight-coupling problems
 that 'import *' brings). (2) When writing Python objects at the C
 level, subclassing lets you achieve memory layout compatibility (which
 is important because C does *not* do duck typing), and it lets you add
 new fields to a C struct.

 So at this level, MAs are a subclass of POAs, because MAs have an
 extra field that POAs don't...

 So I don't know what to think about subclasses/superclasses here,
 because they're such confusing and contradictory concepts that it's
 hard to tell what the actual resulting API semantics would be.


It doesn't seem essential that MAs have an extra field that POAs don't. If
POA was a subclass of MA, instances of POA could have the extra field set
to an all-valid/nothing-is-masked value. Granted, you'd want that to be
a special value so you're not lugging around a load of redundant data (and
you can optimise your processing for that), but I'm guessing you'd probably
want that kind of capability within MA anyway.


On 27 April 2012 15:33, Charles R Harris charlesr.har...@gmail.com wrote:



 On Fri, Apr 27, 2012 at 8:15 AM, Charles R Harris 
 charlesr.har...@gmail.com wrote:



 On Wed, Apr 25, 2012 at 9:58 AM, Richard Hattersley 
 rhatters...@gmail.com wrote:

 The masked array discussions have brought up all sorts of interesting
 topics - too many to usefully list here - but there's one aspect I haven't
 spotted yet. Perhaps that's because it's flat out wrong, or crazy, or just
 too 

Re: [Numpy-discussion] A crazy masked-array thought

2012-04-27 Thread Charles R Harris
On Fri, Apr 27, 2012 at 9:16 AM, josef.p...@gmail.com wrote:

 On Fri, Apr 27, 2012 at 10:33 AM, Charles R Harris
 charlesr.har...@gmail.com wrote:
 
 
  On Fri, Apr 27, 2012 at 8:15 AM, Charles R Harris
  charlesr.har...@gmail.com wrote:
 
 
 
  On Wed, Apr 25, 2012 at 9:58 AM, Richard Hattersley
  rhatters...@gmail.com wrote:
 
  The masked array discussions have brought up all sorts of interesting
  topics - too many to usefully list here - but there's one aspect I
 haven't
  spotted yet. Perhaps that's because it's flat out wrong, or crazy, or
 just
  too awkward to be helpful. But ...
 
  Shouldn't masked arrays (MA) be a superclass of the plain-old-array
  (POA)?
 
  In the library I'm working on, the introduction of MAs (via numpy.ma)
  required us to sweep through the library and make a fair few changes.
 That's
  not the sort of thing one would normally expect from the introduction
 of a
  subclass.
 
  Putting aside the ABI issue, would it help downstream API compatibility
  if the POA was a subclass of the MA? Code that's expecting/casting-to
 a POA
  might continue to work and, where appropriate, could be upgraded in
 their
  own time to accept MAs.
 
 
  That's a version of the idea that all arrays have masks, just some of
 them
  have missing masks. That construction was mentioned in the thread but
 I
  can see how one might have missed it. I think it is the right way to do
  things. However, current libraries and such will still need to do some
 work
  in order to not do the wrong thing when a real mask was present. For
  instance, check and raise an error if they can't deal with it.
 
 
  To expand a bit more, this is precisely why the current work on making
 masks
  part of ndarray rather than a subclass was undertaken. There is a flag
 that
  says whether or not the array is masked, but you will still need to check
  that flag to see if you are working with an unmasked instance of
 ndarray. At
  the moment the masked version isn't quite completely fused with
  ndarrays-classic since the maskedness needs to be specified in the
  constructors and such, but what you suggest is actually what we are
 working
  towards.
 
  No matter what is done, current functions and libraries that want to use
  masks are going to have to deal with the existence of both masked and
  unmasked arrays since the existence of a mask can't be ignored without
  risking wrong results.

 (In case it's not the wrong thread)

 If every ndarray has this maskflag, then it is easy to adjust other
 library code.


That is the case.

In [1]: ones(1).flags
Out[1]:
  C_CONTIGUOUS : True
  F_CONTIGUOUS : True
  OWNDATA : True
  MASKNA : False
  OWNMASKNA : False
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

What I'd like to add is that the mask is only allocated when NA (or
equivalent) is assigned. That way the flag also signals the actual presence
of a masked value.


 if myarr.maskflag is not None: raise SorryException

 What is expensive is having to do np.isnan(myarr) or
 np.isfinite(myarr) everywhere.
 https://github.com/scipy/scipy/pull/48

 As a concept I like the idea, masked arrays are the general class with
 generic defaults, clean arrays are a subclass where some methods are
 overwritten with faster implementations.


Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] A crazy masked-array thought

2012-04-27 Thread Travis Oliphant

On Apr 25, 2012, at 10:58 AM, Richard Hattersley wrote:

 The masked array discussions have brought up all sorts of interesting topics 
 - too many to usefully list here - but there's one aspect I haven't spotted 
 yet. Perhaps that's because it's flat out wrong, or crazy, or just too 
 awkward to be helpful. But ...
 
 Shouldn't masked arrays (MA) be a superclass of the plain-old-array (POA)?

Ultimately, this is what Chuck and Mark are advocating, I believe.It's not 
a crazy idea.   In fact, it's probably more correct in that masked arrays *are* 
more general than POAs.If we were starting from scratch in 1994 (Numeric 
days), I could see taking this route and setting expectations correctly for 
downstream libraries.  

There are three problems I see with jamming this concept into NumPy 1.X, 
however, by modifying all POA data-structures to now *be* masked arrays. 

1) There is a lot of code out there that does not know anything about 
masks and is not used to checking for masks.It enlarges the basic 
abstraction in a way that is not backwards compatible *conceptually*.  This 
smells fishy to me and I could see a lot of downstream problems from libraries 
that rely on NumPy. 

2) We cannot agree on how masks should be handled and consequently 
don't have a real plan for migrating numpy.ma to use these masks.   So, we are 
just growing the API and introducing uncertainty for unclear benefit --- 
especially for the person that does not want to use masks.   

3) Subclassing in C in Python requires that C-structures are *binary* 
compatible.This implies that all subclasses have *more* attributes than the 
superclass.   The way it is currently implemented, that means that POAs would 
have these extra pointers they don't need sitting there to satisfy that 
requirement.   From a C-struct perspective it therefore makes more sense for 
MAs to inherit from POAs.Ideally, that shouldn't drive the design, but it's 
part of the landscape in NumPy 1.X

I have some ideas about how to move forward, but I'm anxiously awaiting the 
write-up that Mark and Nathaniel are working on to inform and enhance those 
ideas.

Masked arrays do have a long history in the Numeric and NumPy code base.  Paul 
Dubois originally created the first masked arrays in Numeric and helped move 
them to numpy.ma.Pierre GM took that code and worked very hard to add a lot 
of features.I'm very concerned about adding a new masked array abstraction 
into the *core* of all NumPy arrays.   Especially one that is not well informed 
by this history nor its user base.   

I was just visiting LLNL a couple of weeks ago and realized that they are using 
masked arrays very heavily in UV-CDAT and elsewhere.I've also seen many 
other people in industry, academia, and government use masked arrays.   I've 
typically squirmed at that because I know that masked arrays have performance 
issues because they are in Python.   I've also wondered about masked arrays as 
*subclasses* of POAs because of how much code has to be rewritten in the 
sub-class for it to work correctly.   

So, in summary.   My view is that NumPy has masked arrays already (and has had 
them for a long-long time).  Missing data is only one of the use-cases for 
masked arrays (though it is probably the dominant use case for numpy.ma).
Independent of the missing-data story,  any plan to add masks directly to a 
base-object in NumPy needs to take into account the numpy.ma user-base and the 
POA user-base that does not expect to be dealing with masks.   

That doesn't mean it needs to follow numpy.ma design choices and API.   It 
does, however, need to think about how a typical numpy.ma user could instead 
use the new masked array concept, and how numpy.ma itself could be re-vised to 
use the new masked array concept.  

I think Mark has done some amazing coding and I would like to keep as much of 
it as possible available to people.   We may need to adjust *how* it is 
presented downstream, but I'm hopeful that we can do that. 

Thanks for your ideas and your comments. 

-Travis



 
 In the library I'm working on, the introduction of MAs (via numpy.ma) 
 required us to sweep through the library and make a fair few changes. That's 
 not the sort of thing one would normally expect from the introduction of a 
 subclass.
 
 Putting aside the ABI issue, would it help downstream API compatibility if 
 the POA was a subclass of the MA? Code that's expecting/casting-to a POA 
 might continue to work and, where appropriate, could be upgraded in their own 
 time to accept MAs.
 
 Richard Hattersley
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion