Re: [Numpy-discussion] A crazy masked-array thought
On 27 April 2012 17:42, Travis Oliphant tra...@continuum.io wrote: 1) There is a lot of code out there that does not know anything about masks and is not used to checking for masks.It enlarges the basic abstraction in a way that is not backwards compatible *conceptually*. This smells fishy to me and I could see a lot of downstream problems from libraries that rely on NumPy. That's exactly why I'd love to see plain arrays remain functionally unchanged. It's just a small, random sample, but here's how a few routines from NumPy and SciPy sanitise their inputs... numpy.trapz (aka scipy.integrate.trapz) - numpy.asanyarray scipy.spatial.KDTree - numpy.asarray scipy.spatial.cKDTree - numpy.ascontiguousarray scipy.integrate.odeint - PyArray_ContiguousFromObject scipy.interpolate.interp1d - numpy.array scipy.interpolate.griddata - numpy.asanyarray numpy.ascontiguousarray So, assuming numpy.ndarray became a strict subclass of some new masked array, it looks plausible that adding just a few checks to numpy.ndarray to exclude the masked superclass would prevent much downstream code from accidentally operating on masked arrays. 2) We cannot agree on how masks should be handled and consequently don't have a real plan for migrating numpy.ma to use these masks. So, we are just growing the API and introducing uncertainty for unclear benefit --- especially for the person that does not want to use masks. I've not yet looked at how numpy.ma users could be migrated. But if we make masked arrays a strict superclass and leave the numpy/ndarray interface and behaviour unchanged, API growth shouldn't be an issue. End-users will be able to completely ignore the existence of masked arrays (except for the minority(?) for whom the ABI/re-compile issue would be relevant). 3) Subclassing in C in Python requires that C-structures are *binary* compatible.This implies that all subclasses have *more* attributes than the superclass. The way it is currently implemented, that means that POAs would have these extra pointers they don't need sitting there to satisfy that requirement. From a C-struct perspective it therefore makes more sense for MAs to inherit from POAs.Ideally, that shouldn't drive the design, but it's part of the landscape in NumPy 1.X I'd hate to see the logical class hierarchy inverted (or collapsed to a single class) just to save a pointer or two from the struct. Now seems like a golden opportunity to fix the relationship between masked and plain arrays. I'm assuming (and implicitly checking that assumption with this statement!) that there's far more code using the Python interface to NumPy, than there is code using the C interface. So I'm urging that the logical consistency of the Python interface (and even the C and Cython interfaces) takes precedence over the C-struct memory saving. I'm not sure I agree with extra pointers they don't need. If we make plain arrays a subclass of masked arrays, aren't these pointers essential to ensure masked array methods can continue to work on plain arrays without requiring special code paths? I have some ideas about how to move forward, but I'm anxiously awaiting the write-up that Mark and Nathaniel are working on to inform and enhance those ideas. +1 As an aside, the implication of preserving the behaviour of the numpy/ndarray interface is that masked arrays will need a *new* interface. For example: import mumpy # Yes - I know it's a terrible name! But I had to write *something* ... sorry! ;-) import numpy a = mumpy.array(...) # makes a masked array b = numpy.array(...) # makes a plain array isinstance(a, mumpy.ndarray) True isinstance(b, mumpy.ndarray) True isinstance(a, numpy.ndarray) False isinstance(b, numpy.ndarray) True Richard Hattersley ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] A crazy masked-array thought
On Sat, Apr 28, 2012 at 7:38 AM, Richard Hattersley rhatters...@gmail.com wrote: So, assuming numpy.ndarray became a strict subclass of some new masked array, it looks plausible that adding just a few checks to numpy.ndarray to exclude the masked superclass would prevent much downstream code from accidentally operating on masked arrays. I think the main point I was trying to make is that it's the existence and content of these checks that matters. They don't necessarily have any relation at all to which thing Python calls a superclass or a subclass. -- Nathaniel ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] A crazy masked-array thought
Nathaniel Smith wrote: On Sat, Apr 28, 2012 at 7:38 AM, Richard Hattersley rhatters...@gmail.com wrote: So, assuming numpy.ndarray became a strict subclass of some new masked array, it looks plausible that adding just a few checks to numpy.ndarray to exclude the masked superclass would prevent much downstream code from accidentally operating on masked arrays. I think the main point I was trying to make is that it's the existence and content of these checks that matters. They don't necessarily have any relation at all to which thing Python calls a superclass or a subclass. -- Nathaniel I don't agree with the argument that ma should be a superclass of ndarray. It is ma that is adding features. That makes it a subclass. We're not talking mathematics here. There is a well-known disease of OOP where everything seems to bubble up to the top of the class hierarchy - so that the base class becomes bloated to support every feature needed by subclasses. I believe that's considered poor design. Is there a way to support ma as a subclass of ndarray, without introducing overhead into ndarray? Without having given this much real thought, I do have some idea. What are the operations that we need on arrays? The most basic are: 1. element access 2. get size (shape) In an OO design, these would be virtual functions (or in C, pointers to functions). But this would introduce unacceptable overhead. In a generic programming design (c++ templates), we would essentially generate 2 copies of every function, one that operates on plain arrays, and one that operates on masked arrays, each using the appropriate function for element access, shape, etc. This way, no uneeded overhead is introduced, (although the code size is increased - but this is probably of little consequence on modern demand-paged OS). Following this approach, ma and ndarray don't have to have any inheritance relation. OTOH, inheritance is probably useful since there are many common features to ma and ndarray, and a lot of code could be shared. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] A crazy masked-array thought
On Sat, Apr 28, 2012 at 10:58 AM, Neal Becker ndbeck...@gmail.com wrote: Nathaniel Smith wrote: On Sat, Apr 28, 2012 at 7:38 AM, Richard Hattersley rhatters...@gmail.com wrote: So, assuming numpy.ndarray became a strict subclass of some new masked array, it looks plausible that adding just a few checks to numpy.ndarray to exclude the masked superclass would prevent much downstream code from accidentally operating on masked arrays. I think the main point I was trying to make is that it's the existence and content of these checks that matters. They don't necessarily have any relation at all to which thing Python calls a superclass or a subclass. -- Nathaniel I don't agree with the argument that ma should be a superclass of ndarray. It is ma that is adding features. That makes it a subclass. We're not talking mathematics here. It isn't a subclass either. In a true subclass, anything that worked on the base class would work equally well on a subclass *without modification*. Basically, it's an independent class with special functions that can handle combinations and ufuncs. Look at all the functions exported in numpy/ma/core.py. Inheritance really isn't an concept appropriate to this case. Pretty much all the functions are rewritten for masked arrays. Which is one reason maintenance is a hassle, lots of things have to be maintained in two places. | There is a well-known disease of OOP where everything seems to bubble up to the top of the class hierarchy - so that the base class becomes bloated to support every feature needed by subclasses. I believe that's considered poor design. Is there a way to support ma as a subclass of ndarray, without introducing overhead into ndarray? Without having given this much real thought, I do have some idea. What are the operations that we need on arrays? The most basic are: 1. element access 2. get size (shape) In an OO design, these would be virtual functions (or in C, pointers to functions). But this would introduce unacceptable overhead. Sure, and you would still have two different functions of almost everything. In a generic programming design (c++ templates), we would essentially generate 2 copies of every function, one that operates on plain arrays, and one that operates on masked arrays, each using the appropriate function for element access, shape, etc. This way, no uneeded overhead is introduced, (although the code size is increased - but this is probably of little consequence on modern demand-paged OS). Following this approach, ma and ndarray don't have to have any inheritance relation. OTOH, inheritance is probably useful since there are many common features to ma and ndarray, and a lot of code could be shared. Not many common behaviours. Analogous behaviours, perhaps. And since everything ends up written twice the best was to share code is to do it in the base class. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] A crazy masked-array thought
I know used a somewhat jokey tone in my original posting, but fundamentally it was a serious question concerning a live topic. So I'm curious about the lack of response. Has this all been covered before? Sorry if I'm being too impatient! On 25 April 2012 16:58, Richard Hattersley rhatters...@gmail.com wrote: The masked array discussions have brought up all sorts of interesting topics - too many to usefully list here - but there's one aspect I haven't spotted yet. Perhaps that's because it's flat out wrong, or crazy, or just too awkward to be helpful. But ... Shouldn't masked arrays (MA) be a superclass of the plain-old-array (POA)? In the library I'm working on, the introduction of MAs (via numpy.ma) required us to sweep through the library and make a fair few changes. That's not the sort of thing one would normally expect from the introduction of a subclass. Putting aside the ABI issue, would it help downstream API compatibility if the POA was a subclass of the MA? Code that's expecting/casting-to a POA might continue to work and, where appropriate, could be upgraded in their own time to accept MAs. Richard Hattersley ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] A crazy masked-array thought
On Fri, Apr 27, 2012 at 6:32 AM, Richard Hattersley rhatters...@gmail.comwrote: I know used a somewhat jokey tone in my original posting, but fundamentally it was a serious question concerning a live topic. So I'm curious about the lack of response. Has this all been covered before? Sorry if I'm being too impatient! Richard, Actually, I am rather surprised by the lack of response as well. Actually, this is quite unusual and I hope it doesn't sour you for more contributions. We do need more crazy ideas like your, if only just to help break out of an infinite loop in a discussion. Your idea is interesting, but doesn't it require C++? Or maybe you are thinking of creating a new C type object that would contain all the new features and hold a pointer and function interface to the original POA. Essentially, the new type would act as a wrapper around the original ndarray? Cheers! Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] A crazy masked-array thought
On Fri, Apr 27, 2012 at 11:32 AM, Richard Hattersley rhatters...@gmail.com wrote: I know used a somewhat jokey tone in my original posting, but fundamentally it was a serious question concerning a live topic. So I'm curious about the lack of response. Has this all been covered before? Sorry if I'm being too impatient! That's fine, I know I did read it, but I wasn't sure what to make of it to respond :-) On 25 April 2012 16:58, Richard Hattersley rhatters...@gmail.com wrote: The masked array discussions have brought up all sorts of interesting topics - too many to usefully list here - but there's one aspect I haven't spotted yet. Perhaps that's because it's flat out wrong, or crazy, or just too awkward to be helpful. But ... Shouldn't masked arrays (MA) be a superclass of the plain-old-array (POA)? In the library I'm working on, the introduction of MAs (via numpy.ma) required us to sweep through the library and make a fair few changes. That's not the sort of thing one would normally expect from the introduction of a subclass. Putting aside the ABI issue, would it help downstream API compatibility if the POA was a subclass of the MA? Code that's expecting/casting-to a POA might continue to work and, where appropriate, could be upgraded in their own time to accept MAs. This makes a certain amount of sense from a traditional OO modeling perspective, where classes are supposed to refer to sets of objects and subclasses are subsets and superclasses are supersets. This is the property that's needed to guarantee that if A is a subclass of B, then any code that expects a B can also handle an A, since all A's are B's, which is what you need if you're doing type-checking or type-based dispatch. And indeed, from this perspective, MAs are a superclass of POAs, because for every POA there's a equivalent MA (the one with the mask set to all-true), but not vice-versa. But, that model of OO doesn't have much connection to Python. In Python's semantics, classes are almost irrelevant; they're mostly just some convenience tools for putting together the objects you want, and what really matters is the behavior of each object (the famous duck typing). You can call isinstance() if you want, but it's just an ordinary function that looks at some attributes on an object; the only magic involved is that some of those attributes have underscores in their name. In Python, subclassing mostly does two things: (1) it's a quick way to define set up a class that's similar to another class (though this is a worse idea than it looks -- you're basically doing 'from other_class import *' with all the usual tight-coupling problems that 'import *' brings). (2) When writing Python objects at the C level, subclassing lets you achieve memory layout compatibility (which is important because C does *not* do duck typing), and it lets you add new fields to a C struct. So at this level, MAs are a subclass of POAs, because MAs have an extra field that POAs don't... So I don't know what to think about subclasses/superclasses here, because they're such confusing and contradictory concepts that it's hard to tell what the actual resulting API semantics would be. - N ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] A crazy masked-array thought
On Wed, Apr 25, 2012 at 9:58 AM, Richard Hattersley rhatters...@gmail.comwrote: The masked array discussions have brought up all sorts of interesting topics - too many to usefully list here - but there's one aspect I haven't spotted yet. Perhaps that's because it's flat out wrong, or crazy, or just too awkward to be helpful. But ... Shouldn't masked arrays (MA) be a superclass of the plain-old-array (POA)? In the library I'm working on, the introduction of MAs (via numpy.ma) required us to sweep through the library and make a fair few changes. That's not the sort of thing one would normally expect from the introduction of a subclass. Putting aside the ABI issue, would it help downstream API compatibility if the POA was a subclass of the MA? Code that's expecting/casting-to a POA might continue to work and, where appropriate, could be upgraded in their own time to accept MAs. That's a version of the idea that all arrays have masks, just some of them have missing masks. That construction was mentioned in the thread but I can see how one might have missed it. I think it is the right way to do things. However, current libraries and such will still need to do some work in order to not do the wrong thing when a real mask was present. For instance, check and raise an error if they can't deal with it. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] A crazy masked-array thought
On Fri, Apr 27, 2012 at 8:15 AM, Charles R Harris charlesr.har...@gmail.com wrote: On Wed, Apr 25, 2012 at 9:58 AM, Richard Hattersley rhatters...@gmail.com wrote: The masked array discussions have brought up all sorts of interesting topics - too many to usefully list here - but there's one aspect I haven't spotted yet. Perhaps that's because it's flat out wrong, or crazy, or just too awkward to be helpful. But ... Shouldn't masked arrays (MA) be a superclass of the plain-old-array (POA)? In the library I'm working on, the introduction of MAs (via numpy.ma) required us to sweep through the library and make a fair few changes. That's not the sort of thing one would normally expect from the introduction of a subclass. Putting aside the ABI issue, would it help downstream API compatibility if the POA was a subclass of the MA? Code that's expecting/casting-to a POA might continue to work and, where appropriate, could be upgraded in their own time to accept MAs. That's a version of the idea that all arrays have masks, just some of them have missing masks. That construction was mentioned in the thread but I can see how one might have missed it. I think it is the right way to do things. However, current libraries and such will still need to do some work in order to not do the wrong thing when a real mask was present. For instance, check and raise an error if they can't deal with it. To expand a bit more, this is precisely why the current work on making masks part of ndarray rather than a subclass was undertaken. There is a flag that says whether or not the array is masked, but you will still need to check that flag to see if you are working with an unmasked instance of ndarray. At the moment the masked version isn't quite completely fused with ndarrays-classic since the maskedness needs to be specified in the constructors and such, but what you suggest is actually what we are working towards. No matter what is done, current functions and libraries that want to use masks are going to have to deal with the existence of both masked and unmasked arrays since the existence of a mask can't be ignored without risking wrong results. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] A crazy masked-array thought
On Fri, Apr 27, 2012 at 10:33 AM, Charles R Harris charlesr.har...@gmail.com wrote: On Fri, Apr 27, 2012 at 8:15 AM, Charles R Harris charlesr.har...@gmail.com wrote: On Wed, Apr 25, 2012 at 9:58 AM, Richard Hattersley rhatters...@gmail.com wrote: The masked array discussions have brought up all sorts of interesting topics - too many to usefully list here - but there's one aspect I haven't spotted yet. Perhaps that's because it's flat out wrong, or crazy, or just too awkward to be helpful. But ... Shouldn't masked arrays (MA) be a superclass of the plain-old-array (POA)? In the library I'm working on, the introduction of MAs (via numpy.ma) required us to sweep through the library and make a fair few changes. That's not the sort of thing one would normally expect from the introduction of a subclass. Putting aside the ABI issue, would it help downstream API compatibility if the POA was a subclass of the MA? Code that's expecting/casting-to a POA might continue to work and, where appropriate, could be upgraded in their own time to accept MAs. That's a version of the idea that all arrays have masks, just some of them have missing masks. That construction was mentioned in the thread but I can see how one might have missed it. I think it is the right way to do things. However, current libraries and such will still need to do some work in order to not do the wrong thing when a real mask was present. For instance, check and raise an error if they can't deal with it. To expand a bit more, this is precisely why the current work on making masks part of ndarray rather than a subclass was undertaken. There is a flag that says whether or not the array is masked, but you will still need to check that flag to see if you are working with an unmasked instance of ndarray. At the moment the masked version isn't quite completely fused with ndarrays-classic since the maskedness needs to be specified in the constructors and such, but what you suggest is actually what we are working towards. No matter what is done, current functions and libraries that want to use masks are going to have to deal with the existence of both masked and unmasked arrays since the existence of a mask can't be ignored without risking wrong results. (In case it's not the wrong thread) If every ndarray has this maskflag, then it is easy to adjust other library code. if myarr.maskflag is not None: raise SorryException What is expensive is having to do np.isnan(myarr) or np.isfinite(myarr) everywhere. https://github.com/scipy/scipy/pull/48 As a concept I like the idea, masked arrays are the general class with generic defaults, clean arrays are a subclass where some methods are overwritten with faster implementations. Josef Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] A crazy masked-array thought
Hi all, Thanks for all your responses and for your patience with a newcomer. Don't worry - I'm not going to give up yet. It's all just part of my learning the ropes. On 27 April 2012 14:05, Benjamin Root ben.r...@ou.edu wrote: snipYour idea is interesting, but doesn't it require C++? Or maybe you are thinking of creating a new C type object that would contain all the new features and hold a pointer and function interface to the original POA. Essentially, the new type would act as a wrapper around the original ndarray?/snip When talking about subclasses I'm just talking about the end-user experience within Python. In other words, I'm starting from issubclass(POA, MA) == True, and trying to figure out what the Python API implications would be. On 27 April 2012 14:55, Nathaniel Smith n...@pobox.com wrote: On Fri, Apr 27, 2012 at 11:32 AM, Richard Hattersley rhatters...@gmail.com wrote: I know used a somewhat jokey tone in my original posting, but fundamentally it was a serious question concerning a live topic. So I'm curious about the lack of response. Has this all been covered before? Sorry if I'm being too impatient! That's fine, I know I did read it, but I wasn't sure what to make of it to respond :-) On 25 April 2012 16:58, Richard Hattersley rhatters...@gmail.com wrote: The masked array discussions have brought up all sorts of interesting topics - too many to usefully list here - but there's one aspect I haven't spotted yet. Perhaps that's because it's flat out wrong, or crazy, or just too awkward to be helpful. But ... Shouldn't masked arrays (MA) be a superclass of the plain-old-array (POA)? In the library I'm working on, the introduction of MAs (via numpy.ma) required us to sweep through the library and make a fair few changes. That's not the sort of thing one would normally expect from the introduction of a subclass. Putting aside the ABI issue, would it help downstream API compatibility if the POA was a subclass of the MA? Code that's expecting/casting-to a POA might continue to work and, where appropriate, could be upgraded in their own time to accept MAs. This makes a certain amount of sense from a traditional OO modeling perspective, where classes are supposed to refer to sets of objects and subclasses are subsets and superclasses are supersets. This is the property that's needed to guarantee that if A is a subclass of B, then any code that expects a B can also handle an A, since all A's are B's, which is what you need if you're doing type-checking or type-based dispatch. And indeed, from this perspective, MAs are a superclass of POAs, because for every POA there's a equivalent MA (the one with the mask set to all-true), but not vice-versa. But, that model of OO doesn't have much connection to Python. In Python's semantics, classes are almost irrelevant; they're mostly just some convenience tools for putting together the objects you want, and what really matters is the behavior of each object (the famous duck typing). You can call isinstance() if you want, but it's just an ordinary function that looks at some attributes on an object; the only magic involved is that some of those attributes have underscores in their name. In Python, subclassing mostly does two things: (1) it's a quick way to define set up a class that's similar to another class (though this is a worse idea than it looks -- you're basically doing 'from other_class import *' with all the usual tight-coupling problems that 'import *' brings). (2) When writing Python objects at the C level, subclassing lets you achieve memory layout compatibility (which is important because C does *not* do duck typing), and it lets you add new fields to a C struct. So at this level, MAs are a subclass of POAs, because MAs have an extra field that POAs don't... So I don't know what to think about subclasses/superclasses here, because they're such confusing and contradictory concepts that it's hard to tell what the actual resulting API semantics would be. It doesn't seem essential that MAs have an extra field that POAs don't. If POA was a subclass of MA, instances of POA could have the extra field set to an all-valid/nothing-is-masked value. Granted, you'd want that to be a special value so you're not lugging around a load of redundant data (and you can optimise your processing for that), but I'm guessing you'd probably want that kind of capability within MA anyway. On 27 April 2012 15:33, Charles R Harris charlesr.har...@gmail.com wrote: On Fri, Apr 27, 2012 at 8:15 AM, Charles R Harris charlesr.har...@gmail.com wrote: On Wed, Apr 25, 2012 at 9:58 AM, Richard Hattersley rhatters...@gmail.com wrote: The masked array discussions have brought up all sorts of interesting topics - too many to usefully list here - but there's one aspect I haven't spotted yet. Perhaps that's because it's flat out wrong, or crazy, or just too
Re: [Numpy-discussion] A crazy masked-array thought
On Fri, Apr 27, 2012 at 9:16 AM, josef.p...@gmail.com wrote: On Fri, Apr 27, 2012 at 10:33 AM, Charles R Harris charlesr.har...@gmail.com wrote: On Fri, Apr 27, 2012 at 8:15 AM, Charles R Harris charlesr.har...@gmail.com wrote: On Wed, Apr 25, 2012 at 9:58 AM, Richard Hattersley rhatters...@gmail.com wrote: The masked array discussions have brought up all sorts of interesting topics - too many to usefully list here - but there's one aspect I haven't spotted yet. Perhaps that's because it's flat out wrong, or crazy, or just too awkward to be helpful. But ... Shouldn't masked arrays (MA) be a superclass of the plain-old-array (POA)? In the library I'm working on, the introduction of MAs (via numpy.ma) required us to sweep through the library and make a fair few changes. That's not the sort of thing one would normally expect from the introduction of a subclass. Putting aside the ABI issue, would it help downstream API compatibility if the POA was a subclass of the MA? Code that's expecting/casting-to a POA might continue to work and, where appropriate, could be upgraded in their own time to accept MAs. That's a version of the idea that all arrays have masks, just some of them have missing masks. That construction was mentioned in the thread but I can see how one might have missed it. I think it is the right way to do things. However, current libraries and such will still need to do some work in order to not do the wrong thing when a real mask was present. For instance, check and raise an error if they can't deal with it. To expand a bit more, this is precisely why the current work on making masks part of ndarray rather than a subclass was undertaken. There is a flag that says whether or not the array is masked, but you will still need to check that flag to see if you are working with an unmasked instance of ndarray. At the moment the masked version isn't quite completely fused with ndarrays-classic since the maskedness needs to be specified in the constructors and such, but what you suggest is actually what we are working towards. No matter what is done, current functions and libraries that want to use masks are going to have to deal with the existence of both masked and unmasked arrays since the existence of a mask can't be ignored without risking wrong results. (In case it's not the wrong thread) If every ndarray has this maskflag, then it is easy to adjust other library code. That is the case. In [1]: ones(1).flags Out[1]: C_CONTIGUOUS : True F_CONTIGUOUS : True OWNDATA : True MASKNA : False OWNMASKNA : False WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False What I'd like to add is that the mask is only allocated when NA (or equivalent) is assigned. That way the flag also signals the actual presence of a masked value. if myarr.maskflag is not None: raise SorryException What is expensive is having to do np.isnan(myarr) or np.isfinite(myarr) everywhere. https://github.com/scipy/scipy/pull/48 As a concept I like the idea, masked arrays are the general class with generic defaults, clean arrays are a subclass where some methods are overwritten with faster implementations. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] A crazy masked-array thought
On Apr 25, 2012, at 10:58 AM, Richard Hattersley wrote: The masked array discussions have brought up all sorts of interesting topics - too many to usefully list here - but there's one aspect I haven't spotted yet. Perhaps that's because it's flat out wrong, or crazy, or just too awkward to be helpful. But ... Shouldn't masked arrays (MA) be a superclass of the plain-old-array (POA)? Ultimately, this is what Chuck and Mark are advocating, I believe.It's not a crazy idea. In fact, it's probably more correct in that masked arrays *are* more general than POAs.If we were starting from scratch in 1994 (Numeric days), I could see taking this route and setting expectations correctly for downstream libraries. There are three problems I see with jamming this concept into NumPy 1.X, however, by modifying all POA data-structures to now *be* masked arrays. 1) There is a lot of code out there that does not know anything about masks and is not used to checking for masks.It enlarges the basic abstraction in a way that is not backwards compatible *conceptually*. This smells fishy to me and I could see a lot of downstream problems from libraries that rely on NumPy. 2) We cannot agree on how masks should be handled and consequently don't have a real plan for migrating numpy.ma to use these masks. So, we are just growing the API and introducing uncertainty for unclear benefit --- especially for the person that does not want to use masks. 3) Subclassing in C in Python requires that C-structures are *binary* compatible.This implies that all subclasses have *more* attributes than the superclass. The way it is currently implemented, that means that POAs would have these extra pointers they don't need sitting there to satisfy that requirement. From a C-struct perspective it therefore makes more sense for MAs to inherit from POAs.Ideally, that shouldn't drive the design, but it's part of the landscape in NumPy 1.X I have some ideas about how to move forward, but I'm anxiously awaiting the write-up that Mark and Nathaniel are working on to inform and enhance those ideas. Masked arrays do have a long history in the Numeric and NumPy code base. Paul Dubois originally created the first masked arrays in Numeric and helped move them to numpy.ma.Pierre GM took that code and worked very hard to add a lot of features.I'm very concerned about adding a new masked array abstraction into the *core* of all NumPy arrays. Especially one that is not well informed by this history nor its user base. I was just visiting LLNL a couple of weeks ago and realized that they are using masked arrays very heavily in UV-CDAT and elsewhere.I've also seen many other people in industry, academia, and government use masked arrays. I've typically squirmed at that because I know that masked arrays have performance issues because they are in Python. I've also wondered about masked arrays as *subclasses* of POAs because of how much code has to be rewritten in the sub-class for it to work correctly. So, in summary. My view is that NumPy has masked arrays already (and has had them for a long-long time). Missing data is only one of the use-cases for masked arrays (though it is probably the dominant use case for numpy.ma). Independent of the missing-data story, any plan to add masks directly to a base-object in NumPy needs to take into account the numpy.ma user-base and the POA user-base that does not expect to be dealing with masks. That doesn't mean it needs to follow numpy.ma design choices and API. It does, however, need to think about how a typical numpy.ma user could instead use the new masked array concept, and how numpy.ma itself could be re-vised to use the new masked array concept. I think Mark has done some amazing coding and I would like to keep as much of it as possible available to people. We may need to adjust *how* it is presented downstream, but I'm hopeful that we can do that. Thanks for your ideas and your comments. -Travis In the library I'm working on, the introduction of MAs (via numpy.ma) required us to sweep through the library and make a fair few changes. That's not the sort of thing one would normally expect from the introduction of a subclass. Putting aside the ABI issue, would it help downstream API compatibility if the POA was a subclass of the MA? Code that's expecting/casting-to a POA might continue to work and, where appropriate, could be upgraded in their own time to accept MAs. Richard Hattersley ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion