Re: [Numpy-discussion] numpy.array() of mixed integers and strings can truncate data

2011-12-05 Thread Thouis Jones
On Fri, Dec 2, 2011 at 18:53, Charles R Harris
charlesr.har...@gmail.com wrote:

 After sleeping on this, I think an object array in this situation would be
 the better choice and wouldn't result in lost information. This might change
 the behavior of
 some functions though, so would need testing.

I tried to come up with a simple patch to achieve this, but I think
this is beyond me, particularly since I think  something different has
to happen for these cases:
np.array([1234, 'ab'])
np.array([1234]).astype('|S2')

I tried a few things (changing the rules in PyArray_PromoteTypes(),
other places), but I think I'm more likely to break some corner case
than fix this cleanly.

I filed a ticket (#1990) and a pull request to add a test to the 1.6.x
maintenance branch, for someone more knowledgeable than me to address.
 I tried to write the test so that either choosing dtype=object or
dtype=string of the required length would both pass.

Ray Jones
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NumPy Governance

2011-12-05 Thread Perry Greenfield
I'm not sure I'm crazy about leaving final decision making for a  
board. A board may be a good way of carefully considering the issues,  
and it could make it's own recommendation (with a sufficient  
majority). But in the end I think one person needs to decide (and that  
decision may go against the board consensus, presumably only rarely).

Why shouldn't that person be you?

Perry

On Dec 4, 2011, at 11:32 PM, Travis Oliphant wrote:

 Great points.   My initial suggestion of 5-11 was more about current  
 board size rather than trying to fix it.

 I agree that having someone represent from major downstream projects  
 would be a great thing.

 -Travis


 On Dec 4, 2011, at 7:16 AM, Alan G Isaac wrote:

 On 12/4/2011 1:43 AM, Charles R Harris wrote:
 I don't think there are 5 active developers, let alone 11.
 With hard work you might scrape together two or three.
 Having 5 or 11 people making decisions for the two or
 three actually doing the work isn't going to go over well.

 Very true! But you might consider including on any board
 a developer or two from important projects that are very
 NumPy dependent.  (E.g., Matplotlib.)

 One other thing: how about starting with a board of 3
 and a rule that says any active developer can request to
 join, that additions are determined by majority vote of
 the existing board, and  that having the board both small
 and odd numbered is a priority?  (Fixing the board size
 in advance for a project we all hope will grow substantially
 seems odd.)

 fwiw,
 Alan Isaac

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

 ---
 Travis Oliphant
 Enthought, Inc.
 oliph...@enthought.com
 1-512-536-1057
 http://www.enthought.com



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] failure to register ufunc loops for user defined types

2011-12-05 Thread Charles R Harris
Hi Geoffrey,

On Mon, Dec 5, 2011 at 12:37 AM, Geoffrey Irving irv...@naml.us wrote:

 On Sun, Dec 4, 2011 at 6:45 PM, Charles R Harris
 charlesr.har...@gmail.com wrote:
 
 
  On Sun, Dec 4, 2011 at 6:59 PM, Geoffrey Irving irv...@naml.us wrote:
 
  On Sun, Dec 4, 2011 at 5:18 PM, Charles R Harris
  charlesr.har...@gmail.com wrote:
  
  
   On Sun, Dec 4, 2011 at 5:41 PM, Geoffrey Irving irv...@naml.us
 wrote:
  
   This may be the problem.  Simple diffs are pleasant.  I'm guessing
   this code doesn't get a lot of testing.  Glad it's there, though!
  
   Geoffrey
  
   diff --git a/numpy/core/src/umath/ufunc_type_resolution.c
   b/numpy/core/src/umath/ufunc_type_resolution.c
   index 0d6cf19..a93eda1 100644
   --- a/numpy/core/src/umath/ufunc_type_resolution.c
   +++ b/numpy/core/src/umath/ufunc_type_resolution.c
   @@ -1866,7 +1866,7 @@ linear_search_type_resolver(PyUFuncObject
 *self,
   case -1:
   return -1;
   /* A loop was found */
   -case 1:
   +case 0:
   return 0;
   }
   }
  
  
   Heh. Can you verify that this fixes the problem? That function is only
   called once  and its return value is passed up the chain, but the
   documented
   return values of that calling function are -1, 0. So the documentation
   needs
   to be changed if this is the right thing to do.
 
  Actually, that patch was wrong, since
  linear_search_userloop_type_resolver needs to return three values
  (error, not-found, success).  A better patch follows.  I can confirm
  that this gets me further, but I get other failures down the line, so
  more fixes may follow.  I'll push the branch with all my fixes for
  convenience once I have everything working.
 
   Speaking of tests... I was wondering if you could be talked into
 putting
   together a simple user type for including in the tests?
 
  Yep, though likely not for a couple weeks.  If there's interest, I
  could also be convinced to sanitize my entire rational class so you
  could include that directly.  Currently it's both C++ and uses some
  gcc specific features like __int128_t.  Basically it's
  numerator/denominator, where both are 64 bit integers, and an
  OverflowError is thrown if anything can't be represented as such
  (possibly a different exception would be better in cases like
  (164)/((164)+1)).  It would be easy to generalize it to rational32
  vs. rational64 as well.
 
  If you want tests but not rational, it would be straightforward to
  strip what I have down to a bare bones test case.
 
 
  We'll see how much interest there is. If it becomes official you may get
  more feedback on features. There are some advantages to having some user
  types in numpy. One is that otherwise they tend to get lost, another is
 that
  having a working example or two provides a templates for others to work
  from, and finally they provide test material. Because official user types
  aren't assigned anywhere there might also be some conflicts. Maybe
 something
  like an extension types module would be a way around that. In any case, I
  think both rational numbers and quaternions would be useful to have and I
  hope there is some discussion of how to do that. Rationals may be a bit
  trickier than quaternions though, as usually they are used to provide
 exact
  arithmetic without concern for precision. I don't know how restrictive
 the
  64 bit limitation will be in practice. What are you using them for?

 I'm using them for frivolous analysis of poker Nash equilibria.  I'll
 let others decide if it has any non-toy uses.  64 bits seems to be
 enough for me, though it's possible that I'll run in trouble with
 other examples.  It still exact, though, in the sense that it throws
 an exception rather than doing anything weird if it overflows.  And it
 has the key advantage of being orders of magnitude faster than object
 arrays of Fractions.

 Back to the bugs: here's a branch with all the changes I needed to get
 rational arithmetic to work:

https://github.com/girving/numpy

 I discovered two more after the last email.  One is another simple 0
 vs. 1 bug, and another is somewhat optional:

 commit 730b05a892371d6f18d9317e5ae6dc306c0211b0
 Author: Geoffrey Irving irv...@naml.us
 Date:   Sun Dec 4 20:03:46 2011 -0800

After loops, check for PyErr_Occurred() even if needs_api is 0

For certain types of user defined classes, casting and ufunc loops
normally run without the Python API, but occasionally need to throw
an error.  Currently we assume that !needs_api means no error occur.
However, the fastest way to implement such loops is to run without
the GIL normally and use PyGILState_Ensure/Release if an error occurs.

In order to support this usage pattern, change all post-loop checks from

needs_api  PyErr_Occurred()

to simply

PyErr_Occurred()


Thanks. Could you put this work into a separate branch, say fixuserloops,
and enter a 

Re: [Numpy-discussion] failure to register ufunc loops for user defined types

2011-12-05 Thread Geoffrey Irving
On Mon, Dec 5, 2011 at 6:59 AM, Charles R Harris
charlesr.har...@gmail.com wrote:
 Hi Geoffrey,

 On Mon, Dec 5, 2011 at 12:37 AM, Geoffrey Irving irv...@naml.us wrote:

 On Sun, Dec 4, 2011 at 6:45 PM, Charles R Harris
 charlesr.har...@gmail.com wrote:
 
 
  On Sun, Dec 4, 2011 at 6:59 PM, Geoffrey Irving irv...@naml.us wrote:
 
  On Sun, Dec 4, 2011 at 5:18 PM, Charles R Harris
  charlesr.har...@gmail.com wrote:
  
  
   On Sun, Dec 4, 2011 at 5:41 PM, Geoffrey Irving irv...@naml.us
   wrote:
  
   This may be the problem.  Simple diffs are pleasant.  I'm guessing
   this code doesn't get a lot of testing.  Glad it's there, though!
  
   Geoffrey
  
   diff --git a/numpy/core/src/umath/ufunc_type_resolution.c
   b/numpy/core/src/umath/ufunc_type_resolution.c
   index 0d6cf19..a93eda1 100644
   --- a/numpy/core/src/umath/ufunc_type_resolution.c
   +++ b/numpy/core/src/umath/ufunc_type_resolution.c
   @@ -1866,7 +1866,7 @@ linear_search_type_resolver(PyUFuncObject
   *self,
               case -1:
                   return -1;
               /* A loop was found */
   -            case 1:
   +            case 0:
                   return 0;
           }
       }
  
  
   Heh. Can you verify that this fixes the problem? That function is
   only
   called once  and its return value is passed up the chain, but the
   documented
   return values of that calling function are -1, 0. So the
   documentation
   needs
   to be changed if this is the right thing to do.
 
  Actually, that patch was wrong, since
  linear_search_userloop_type_resolver needs to return three values
  (error, not-found, success).  A better patch follows.  I can confirm
  that this gets me further, but I get other failures down the line, so
  more fixes may follow.  I'll push the branch with all my fixes for
  convenience once I have everything working.
 
   Speaking of tests... I was wondering if you could be talked into
   putting
   together a simple user type for including in the tests?
 
  Yep, though likely not for a couple weeks.  If there's interest, I
  could also be convinced to sanitize my entire rational class so you
  could include that directly.  Currently it's both C++ and uses some
  gcc specific features like __int128_t.  Basically it's
  numerator/denominator, where both are 64 bit integers, and an
  OverflowError is thrown if anything can't be represented as such
  (possibly a different exception would be better in cases like
  (164)/((164)+1)).  It would be easy to generalize it to rational32
  vs. rational64 as well.
 
  If you want tests but not rational, it would be straightforward to
  strip what I have down to a bare bones test case.
 
 
  We'll see how much interest there is. If it becomes official you may get
  more feedback on features. There are some advantages to having some user
  types in numpy. One is that otherwise they tend to get lost, another is
  that
  having a working example or two provides a templates for others to work
  from, and finally they provide test material. Because official user
  types
  aren't assigned anywhere there might also be some conflicts. Maybe
  something
  like an extension types module would be a way around that. In any case,
  I
  think both rational numbers and quaternions would be useful to have and
  I
  hope there is some discussion of how to do that. Rationals may be a bit
  trickier than quaternions though, as usually they are used to provide
  exact
  arithmetic without concern for precision. I don't know how restrictive
  the
  64 bit limitation will be in practice. What are you using them for?

 I'm using them for frivolous analysis of poker Nash equilibria.  I'll
 let others decide if it has any non-toy uses.  64 bits seems to be
 enough for me, though it's possible that I'll run in trouble with
 other examples.  It still exact, though, in the sense that it throws
 an exception rather than doing anything weird if it overflows.  And it
 has the key advantage of being orders of magnitude faster than object
 arrays of Fractions.

 Back to the bugs: here's a branch with all the changes I needed to get
 rational arithmetic to work:

    https://github.com/girving/numpy

 I discovered two more after the last email.  One is another simple 0
 vs. 1 bug, and another is somewhat optional:

 commit 730b05a892371d6f18d9317e5ae6dc306c0211b0
 Author: Geoffrey Irving irv...@naml.us
 Date:   Sun Dec 4 20:03:46 2011 -0800

    After loops, check for PyErr_Occurred() even if needs_api is 0

    For certain types of user defined classes, casting and ufunc loops
    normally run without the Python API, but occasionally need to throw
    an error.  Currently we assume that !needs_api means no error occur.
    However, the fastest way to implement such loops is to run without
    the GIL normally and use PyGILState_Ensure/Release if an error occurs.

    In order to support this usage pattern, change all post-loop checks
 from

        needs_api  PyErr_Occurred()

    to simply

        

Re: [Numpy-discussion] failure to register ufunc loops for user defined types

2011-12-05 Thread David Cournapeau
On Sun, Dec 4, 2011 at 9:45 PM, Charles R Harris
charlesr.har...@gmail.com wrote:


 We'll see how much interest there is. If it becomes official you may get
 more feedback on features. There are some advantages to having some user
 types in numpy. One is that otherwise they tend to get lost, another is that
 having a working example or two provides a templates for others to work
 from, and finally they provide test material. Because official user types
 aren't assigned anywhere there might also be some conflicts. Maybe something
 like an extension types module would be a way around that. In any case, I
 think both rational numbers and quaternions would be useful to have and I
 hope there is some discussion of how to do that.

I agree that those will be useful, but I am worried about adding more
stuff in multiarray. User-types should really be separated from
multiarray. Ideally, they should be plugins but separated from
multiarray would be a good first step.

I realize it is a bit unfair to have this ready for Geoffray's code
changes, but depending on the timelines for the 2.0.0 milestone, I
think this would be a useful thing to have. Otherwise, if some ABI/API
changes are needed after 2.0, we will be dragged down with this for
years. I am willing to spend time on this. Geoffray, does this sound
acceptable to you ?

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] failure to register ufunc loops for user defined types

2011-12-05 Thread Mark Wiebe
On Sun, Dec 4, 2011 at 11:37 PM, Geoffrey Irving irv...@naml.us wrote:

 snip

 Back to the bugs: here's a branch with all the changes I needed to get
 rational arithmetic to work:

https://github.com/girving/numpy

 I discovered two more after the last email.  One is another simple 0
 vs. 1 bug, and another is somewhat optional:

 commit 730b05a892371d6f18d9317e5ae6dc306c0211b0
 Author: Geoffrey Irving irv...@naml.us
 Date:   Sun Dec 4 20:03:46 2011 -0800

After loops, check for PyErr_Occurred() even if needs_api is 0

For certain types of user defined classes, casting and ufunc loops
normally run without the Python API, but occasionally need to throw
an error.  Currently we assume that !needs_api means no error occur.
However, the fastest way to implement such loops is to run without
the GIL normally and use PyGILState_Ensure/Release if an error occurs.

In order to support this usage pattern, change all post-loop checks from

needs_api  PyErr_Occurred()

to simply

PyErr_Occurred()


To support this properly, I think we would need to convert needs_api into
an enum with this hybrid mode as another case. While it isn't done
currently, I was imagining using a thread pool to multithread the trivially
data-parallel operations when needs_api is false, and I suspect the
PyGILState_Ensure/Release would trigger undefined behavior in a thread
created entirely outside of the Python system. For comparison, I created a
special mechanism for simplified multi-threaded exceptions in the nditer in
the 'errmsg' parameter:

http://docs.scipy.org/doc/numpy/reference/c-api.iterator.html#NpyIter_GetIterNext

Worth considering is also the fact that the PyGILState API is incompatible
with multiple embedded interpreters. Maybe that's not something anyone does
with NumPy, though.

-Mark



 Geoffrey
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NumPy Governance

2011-12-05 Thread Bruce Southey
On 12/05/2011 06:22 AM, Perry Greenfield wrote:
 I'm not sure I'm crazy about leaving final decision making for a
 board. A board may be a good way of carefully considering the issues,
 and it could make it's own recommendation (with a sufficient
 majority). But in the end I think one person needs to decide (and that
 decision may go against the board consensus, presumably only rarely).

 Why shouldn't that person be you?

 Perry

I have similar thoughts because I just do not see how a board would work 
especially given that anyone can be a 'core developer' because the 
distributed aspect removes that 'entry barrier'.

I also think that there needs to be something formal like Linux Kernel 
Summit (see the excellent coverage by LWN.net; 
http://lwn.net/Articles/KernelSummit2011/). I know that people get 
together to talk at meetings or via invitation 
(http://blog.fperez.org/2011/05/austin-trip-ipython-at-tacc-and.html). 
This would provide a good opportunity to hash out concerns, introduce 
new features and identify community needs that cannot be adequately 
addressed via electronic communication.  The datarray is a 'good' 
example of how this could work except that it has not been pushed 
upstream yet! (It would be a excellent example if it had been pushed 
upstream :-) hint, hint.)

I also must disagree with statement of Travis that discussions happen 
as they do now on the mailing list. This is simply not true because the 
mailing lists, tickets and pull requests are not connected so these have 
their own discussion threads. Sure there are some nice examples, Mark 
did tell us about this NA branch but the actual merge was still a 
surprise. So I think better communication of these such as emailing the 
list with a set 'public comment period' before requests are merged 
(longer periods for major changes).

Bruce

 On Dec 4, 2011, at 11:32 PM, Travis Oliphant wrote:

 Great points.   My initial suggestion of 5-11 was more about current
 board size rather than trying to fix it.

 I agree that having someone represent from major downstream projects
 would be a great thing.

 -Travis


 On Dec 4, 2011, at 7:16 AM, Alan G Isaac wrote:

 On 12/4/2011 1:43 AM, Charles R Harris wrote:
 I don't think there are 5 active developers, let alone 11.
 With hard work you might scrape together two or three.
 Having 5 or 11 people making decisions for the two or
 three actually doing the work isn't going to go over well.
 Very true! But you might consider including on any board
 a developer or two from important projects that are very
 NumPy dependent.  (E.g., Matplotlib.)

 One other thing: how about starting with a board of 3
 and a rule that says any active developer can request to
 join, that additions are determined by majority vote of
 the existing board, and  that having the board both small
 and odd numbered is a priority?  (Fixing the board size
 in advance for a project we all hope will grow substantially
 seems odd.)

 fwiw,
 Alan Isaac

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
 ---
 Travis Oliphant
 Enthought, Inc.
 oliph...@enthought.com
 1-512-536-1057
 http://www.enthought.com



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] failure to register ufunc loops for user defined types

2011-12-05 Thread mark florisson
On 5 December 2011 17:25, Mark Wiebe mwwi...@gmail.com wrote:
 On Sun, Dec 4, 2011 at 11:37 PM, Geoffrey Irving irv...@naml.us wrote:

 snip


 Back to the bugs: here's a branch with all the changes I needed to get
 rational arithmetic to work:

    https://github.com/girving/numpy

 I discovered two more after the last email.  One is another simple 0
 vs. 1 bug, and another is somewhat optional:

 commit 730b05a892371d6f18d9317e5ae6dc306c0211b0
 Author: Geoffrey Irving irv...@naml.us
 Date:   Sun Dec 4 20:03:46 2011 -0800

    After loops, check for PyErr_Occurred() even if needs_api is 0

    For certain types of user defined classes, casting and ufunc loops
    normally run without the Python API, but occasionally need to throw
    an error.  Currently we assume that !needs_api means no error occur.
    However, the fastest way to implement such loops is to run without
    the GIL normally and use PyGILState_Ensure/Release if an error occurs.

    In order to support this usage pattern, change all post-loop checks
 from

        needs_api  PyErr_Occurred()

    to simply

        PyErr_Occurred()


 To support this properly, I think we would need to convert needs_api into an
 enum with this hybrid mode as another case. While it isn't done currently, I
 was imagining using a thread pool to multithread the trivially data-parallel
 operations when needs_api is false, and I suspect the
 PyGILState_Ensure/Release would trigger undefined behavior in a thread
 created entirely outside of the Python system.

PyGILState_Ensure/Release can be safely used by non-python threads
with the only requirement that the GIL has been initialized previously
in the main thread (PyEval_InitThreads).

 For comparison, I created a
 special mechanism for simplified multi-threaded exceptions in the nditer in
 the 'errmsg' parameter:

 http://docs.scipy.org/doc/numpy/reference/c-api.iterator.html#NpyIter_GetIterNext

 Worth considering is also the fact that the PyGILState API is incompatible
 with multiple embedded interpreters. Maybe that's not something anyone does
 with NumPy, though.

 -Mark



 Geoffrey
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] failure to register ufunc loops for user defined types

2011-12-05 Thread Mark Wiebe
On Mon, Dec 5, 2011 at 8:58 AM, David Cournapeau courn...@gmail.com wrote:

 On Sun, Dec 4, 2011 at 9:45 PM, Charles R Harris
 charlesr.har...@gmail.com wrote:

 
  We'll see how much interest there is. If it becomes official you may get
  more feedback on features. There are some advantages to having some user
  types in numpy. One is that otherwise they tend to get lost, another is
 that
  having a working example or two provides a templates for others to work
  from, and finally they provide test material. Because official user types
  aren't assigned anywhere there might also be some conflicts. Maybe
 something
  like an extension types module would be a way around that. In any case, I
  think both rational numbers and quaternions would be useful to have and I
  hope there is some discussion of how to do that.

 I agree that those will be useful, but I am worried about adding more
 stuff in multiarray. User-types should really be separated from
 multiarray. Ideally, they should be plugins but separated from
 multiarray would be a good first step.


I think the object and datetime dtypes should also be moved out of the core
multiarray module at some point. The user-type mechanism could be improved
a lot based on Martin's feedback after he did the quaternion
implementation, and needs further expansion to be able to support object
and datetime arrays as currently implemented.

I realize it is a bit unfair to have this ready for Geoffray's code
 changes, but depending on the timelines for the 2.0.0 milestone, I
 think this would be a useful thing to have. Otherwise, if some ABI/API
 changes are needed after 2.0, we will be dragged down with this for
 years. I am willing to spend time on this. Geoffray, does this sound
 acceptable to you ?


A rational type could be added without breaking the ABI, in the same way it
was done for datetime and half in 1.6. I think the revamp of the user-type
mechanism needs its own NEP design document, because changing it will be a
very delicate operation in dealing with how it interacts with the NumPy
core, and making it much more programmer-friendly will take a fair number
of design iterations.

-Mark



 David
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] astype does not work with NA object

2011-12-05 Thread Bruce Southey
Hi,
I mistakenly filed ticket 1973 Can not display a masked array 
containing np.NA values even if masked 
(http://projects.scipy.org/numpy/ticket/1973) against masked array 
because that was where I found it. But the actual error is that the 
astype function does not handle the NA object:

$ python
Python 2.7 (r27:82500, Sep 16 2010, 18:02:00)
[GCC 4.5.1 20100907 (Red Hat 4.5.1-3)] on linux2
Type help, copyright, credits or license for more information.
  import numpy as np
  np.__version__
'2.0.0.dev-059334c'
  np.array([1,2,3,4]).astype(float)
array([ 1.,  2.,  3.,  4.])
  np.array([1,2,3,np.NA]).astype(float)
Traceback (most recent call last):
   File stdin, line 1, in module
ValueError: Cannot assign NA to an array which does not support NAs
  a=np.array([1,2,3,4], maskna=True)
  a[3]=np.NA
  a
array([1, 2, 3, NA])
  a.astype(float)
Traceback (most recent call last):
   File stdin, line 1, in module
ValueError: Cannot assign NA to an array which does not support NAs
 a*1.0
array([ 1.,  2.,  3.,  NA])

Bruce
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] failure to register ufunc loops for user defined types

2011-12-05 Thread Mark Wiebe
On Mon, Dec 5, 2011 at 9:37 AM, mark florisson markflorisso...@gmail.comwrote:

 On 5 December 2011 17:25, Mark Wiebe mwwi...@gmail.com wrote:
  On Sun, Dec 4, 2011 at 11:37 PM, Geoffrey Irving irv...@naml.us wrote:
 
  snip
 
 
  Back to the bugs: here's a branch with all the changes I needed to get
  rational arithmetic to work:
 
 https://github.com/girving/numpy
 
  I discovered two more after the last email.  One is another simple 0
  vs. 1 bug, and another is somewhat optional:
 
  commit 730b05a892371d6f18d9317e5ae6dc306c0211b0
  Author: Geoffrey Irving irv...@naml.us
  Date:   Sun Dec 4 20:03:46 2011 -0800
 
 After loops, check for PyErr_Occurred() even if needs_api is 0
 
 For certain types of user defined classes, casting and ufunc loops
 normally run without the Python API, but occasionally need to throw
 an error.  Currently we assume that !needs_api means no error occur.
 However, the fastest way to implement such loops is to run without
 the GIL normally and use PyGILState_Ensure/Release if an error
 occurs.
 
 In order to support this usage pattern, change all post-loop checks
  from
 
 needs_api  PyErr_Occurred()
 
 to simply
 
 PyErr_Occurred()
 
 
  To support this properly, I think we would need to convert needs_api
 into an
  enum with this hybrid mode as another case. While it isn't done
 currently, I
  was imagining using a thread pool to multithread the trivially
 data-parallel
  operations when needs_api is false, and I suspect the
  PyGILState_Ensure/Release would trigger undefined behavior in a thread
  created entirely outside of the Python system.

 PyGILState_Ensure/Release can be safely used by non-python threads
 with the only requirement that the GIL has been initialized previously
 in the main thread (PyEval_InitThreads).


Is there a way this could efficiently be used to propagate any errors back
to the main thread, for example using TBB as the thread pool? The innermost
task code which calls the inner loop can't call PyErr_Occurred() without
first calling PyGILState_Ensure itself, which would kill utilization.

Maybe this is an ABI problem in NumPy that needs to be fixed, to mandate
that inner loops always return an error code and disallow them from setting
the Python exception state without returning failure.

-Mark



  For comparison, I created a
  special mechanism for simplified multi-threaded exceptions in the nditer
 in
  the 'errmsg' parameter:
 
 
 http://docs.scipy.org/doc/numpy/reference/c-api.iterator.html#NpyIter_GetIterNext
 
  Worth considering is also the fact that the PyGILState API is
 incompatible
  with multiple embedded interpreters. Maybe that's not something anyone
 does
  with NumPy, though.
 
  -Mark
 
 
 
  Geoffrey
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] failure to register ufunc loops for user defined types

2011-12-05 Thread mark florisson
On 5 December 2011 17:48, Mark Wiebe mwwi...@gmail.com wrote:
 On Mon, Dec 5, 2011 at 9:37 AM, mark florisson markflorisso...@gmail.com
 wrote:

 On 5 December 2011 17:25, Mark Wiebe mwwi...@gmail.com wrote:
  On Sun, Dec 4, 2011 at 11:37 PM, Geoffrey Irving irv...@naml.us wrote:
 
  snip
 
 
  Back to the bugs: here's a branch with all the changes I needed to get
  rational arithmetic to work:
 
     https://github.com/girving/numpy
 
  I discovered two more after the last email.  One is another simple 0
  vs. 1 bug, and another is somewhat optional:
 
  commit 730b05a892371d6f18d9317e5ae6dc306c0211b0
  Author: Geoffrey Irving irv...@naml.us
  Date:   Sun Dec 4 20:03:46 2011 -0800
 
     After loops, check for PyErr_Occurred() even if needs_api is 0
 
     For certain types of user defined classes, casting and ufunc loops
     normally run without the Python API, but occasionally need to throw
     an error.  Currently we assume that !needs_api means no error occur.
     However, the fastest way to implement such loops is to run without
     the GIL normally and use PyGILState_Ensure/Release if an error
  occurs.
 
     In order to support this usage pattern, change all post-loop checks
  from
 
         needs_api  PyErr_Occurred()
 
     to simply
 
         PyErr_Occurred()
 
 
  To support this properly, I think we would need to convert needs_api
  into an
  enum with this hybrid mode as another case. While it isn't done
  currently, I
  was imagining using a thread pool to multithread the trivially
  data-parallel
  operations when needs_api is false, and I suspect the
  PyGILState_Ensure/Release would trigger undefined behavior in a thread
  created entirely outside of the Python system.

 PyGILState_Ensure/Release can be safely used by non-python threads
 with the only requirement that the GIL has been initialized previously
 in the main thread (PyEval_InitThreads).


 Is there a way this could efficiently be used to propagate any errors back
 to the main thread, for example using TBB as the thread pool? The innermost
 task code which calls the inner loop can't call PyErr_Occurred() without
 first calling PyGILState_Ensure itself, which would kill utilization.

No, there is no way these things can be efficient, as the GIL is
likely contented anyway (I wasn't making a point for these functions,
just wanted to clarify). There is in fact the additional problem that
PyGILState_Ensure would initialize a threadstate, you set an
exception, and when you call PyGILState_Release the threadstate gets
deleted along with the exception, before you will even have a chance
to check with PyErr_Occurred().

For cython.parallel I worked around this by calling PyGILState_Ensure
(to initialize the thread state), followed immediately by
Py_BEGIN_ALLOW_THREADS before starting any work. You then have to
fetch the exception and restore it in another thread when you want to
propagate it. It's a total mess, it's inefficient and if you can avoid
it you should.

 Maybe this is an ABI problem in NumPy that needs to be fixed, to mandate
 that inner loops always return an error code and disallow them from setting
 the Python exception state without returning failure.

That would likely be the best thing.

 -Mark



  For comparison, I created a
  special mechanism for simplified multi-threaded exceptions in the nditer
  in
  the 'errmsg' parameter:
 
 
  http://docs.scipy.org/doc/numpy/reference/c-api.iterator.html#NpyIter_GetIterNext
 
  Worth considering is also the fact that the PyGILState API is
  incompatible
  with multiple embedded interpreters. Maybe that's not something anyone
  does
  with NumPy, though.
 
  -Mark
 
 
 
  Geoffrey
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] failure to register ufunc loops for user defined types

2011-12-05 Thread mark florisson
On 5 December 2011 17:57, mark florisson markflorisso...@gmail.com wrote:
 On 5 December 2011 17:48, Mark Wiebe mwwi...@gmail.com wrote:
 On Mon, Dec 5, 2011 at 9:37 AM, mark florisson markflorisso...@gmail.com
 wrote:

 On 5 December 2011 17:25, Mark Wiebe mwwi...@gmail.com wrote:
  On Sun, Dec 4, 2011 at 11:37 PM, Geoffrey Irving irv...@naml.us wrote:
 
  snip
 
 
  Back to the bugs: here's a branch with all the changes I needed to get
  rational arithmetic to work:
 
     https://github.com/girving/numpy
 
  I discovered two more after the last email.  One is another simple 0
  vs. 1 bug, and another is somewhat optional:
 
  commit 730b05a892371d6f18d9317e5ae6dc306c0211b0
  Author: Geoffrey Irving irv...@naml.us
  Date:   Sun Dec 4 20:03:46 2011 -0800
 
     After loops, check for PyErr_Occurred() even if needs_api is 0
 
     For certain types of user defined classes, casting and ufunc loops
     normally run without the Python API, but occasionally need to throw
     an error.  Currently we assume that !needs_api means no error occur.
     However, the fastest way to implement such loops is to run without
     the GIL normally and use PyGILState_Ensure/Release if an error
  occurs.
 
     In order to support this usage pattern, change all post-loop checks
  from
 
         needs_api  PyErr_Occurred()
 
     to simply
 
         PyErr_Occurred()
 
 
  To support this properly, I think we would need to convert needs_api
  into an
  enum with this hybrid mode as another case. While it isn't done
  currently, I
  was imagining using a thread pool to multithread the trivially
  data-parallel
  operations when needs_api is false, and I suspect the
  PyGILState_Ensure/Release would trigger undefined behavior in a thread
  created entirely outside of the Python system.

 PyGILState_Ensure/Release can be safely used by non-python threads
 with the only requirement that the GIL has been initialized previously
 in the main thread (PyEval_InitThreads).


 Is there a way this could efficiently be used to propagate any errors back
 to the main thread, for example using TBB as the thread pool? The innermost
 task code which calls the inner loop can't call PyErr_Occurred() without
 first calling PyGILState_Ensure itself, which would kill utilization.

 No, there is no way these things can be efficient, as the GIL is
 likely contented anyway (I wasn't making a point for these functions,
 just wanted to clarify). There is in fact the additional problem that
 PyGILState_Ensure would initialize a threadstate, you set an
 exception, and when you call PyGILState_Release the threadstate gets
 deleted along with the exception, before you will even have a chance
 to check with PyErr_Occurred().

To clarify, this case will only happen if you're doing this from a
non-Python thread that doesn't have a threadstate to begin with.

 For cython.parallel I worked around this by calling PyGILState_Ensure
 (to initialize the thread state), followed immediately by
 Py_BEGIN_ALLOW_THREADS before starting any work. You then have to
 fetch the exception and restore it in another thread when you want to
 propagate it. It's a total mess, it's inefficient and if you can avoid
 it you should.

 Maybe this is an ABI problem in NumPy that needs to be fixed, to mandate
 that inner loops always return an error code and disallow them from setting
 the Python exception state without returning failure.

 That would likely be the best thing.

 -Mark



  For comparison, I created a
  special mechanism for simplified multi-threaded exceptions in the nditer
  in
  the 'errmsg' parameter:
 
 
  http://docs.scipy.org/doc/numpy/reference/c-api.iterator.html#NpyIter_GetIterNext
 
  Worth considering is also the fact that the PyGILState API is
  incompatible
  with multiple embedded interpreters. Maybe that's not something anyone
  does
  with NumPy, though.
 
  -Mark
 
 
 
  Geoffrey
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NumPy Governance

2011-12-05 Thread Mark Wiebe
On Sat, Dec 3, 2011 at 6:18 PM, Travis Oliphant teoliph...@gmail.comwrote:


 Hi everyone,

 There have been some wonderfully vigorous discussions over the past few
 months that have made it clear that we need some clarity about how
 decisions will be made in the NumPy community.

 When we were a smaller bunch of people it seemed easier to come to an
 agreement and things pretty much evolved based on (mostly) consensus and
 who was available to actually do the work.

 There is a need for a more clear structure so that we know how decisions
 will get made and so that code can move forward while paying attention to
 the current user-base.   There has been a steering committee structure
 for SciPy in the past, and I have certainly been prone to lump both NumPy
 and SciPy together given that I have a strong interest in and have spent a
 great amount of time working on both projects.Others have also spent
 time on both projects.

 However, I think it is critical at this stage to clearly separate the
 projects and define a governing structure that is fair and agreeable for
 NumPy.   SciPy has multiple modules and will probably need structure around
 each module independently.For now, I wanted to open up a discussion to
 see what people thought about NumPy's governance.

 My initial thoughts:

* discussions happen as they do now on the mailing list
* a small group of developers (5-11) constitute the board and
 major decisions are made by vote of that group (not just simple majority
 --- needs at least 2/3 +1 votes).
* votes are +1/+0/-0/-1
* if a topic is difficult to resolve it is moved off the main list
 and discussed on a separate board mailing list --- these should be rare,
 but parts of the NA discussion would probably qualify
* This board mailing list is publically viewable but only board
 members may post.
* The board is renewed and adjusted each year --- based on
 nomination and 2/3 vote of the current board until board is at 11.
* The chairman of the board is voted by a majority of the board and
 has veto power unless over-ridden by 3/4 of the board.
* Petitions to remove people off the board can be made by 50+
 independent reverse nominations (hopefully people will just withdraw if
 they are no longer active).

 All of these points are open for discussion.  I just thought I would start
 the conversation.   I will be much more active this next year with NumPy
 and will be very interested in the direction NumPy is taking.I'm hoping
 to discern by this conversation, who else is very interested in the
 direction of NumPy so that the first board can be formally constituted.


I'm definitely in support of something along these lines. My experience
entering NumPy development was that the development process, coding
standards, and other aspects of the process are not very well specified,
and people have many differing ideas about what has already been agreed
upon. I would recommend that fixing this state of affairs be placed high on
the agenda of the board, with the goal of making it easier to attract new
developers.

A few people have proposed the BDFL approach, as in CPython development. In
practice, I believe Guido has done very well in the role because he only
uses the power as a last resort. Even if NumPy adopts a similar approach,
having a board along the lines Travis proposes would still be a good thing,
and having a BDFL would just mean that there's someone who could override
the will of the board and make an entirely different choice.

It may be worth considering how the governance structure is related to the
different levels of the NumPy codebase. There is a (very) small group of
people who have contributed significant amounts of C code, a larger group
of people who have contributed significant amounts of Python code, many
people who have contributed small C and/or Python patches, and a large
number of people who contribute bug reports, email list comments, etc. It
may be worth designing the board taking into account these different groups
of developers and users.

-Mark



 Best regards,

 -Travis


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] numpy 1.7.0 release?

2011-12-05 Thread Ralf Gommers
Hi all,

It's been a little over 6 months since the release of 1.6.0 and the NA
debate has quieted down, so I'd like to ask your opinion on the timing of
1.7.0. It looks to me like we have a healthy amount of bug fixes and small
improvements, plus three larger chucks of work:

- datetime
- NA
- Bento support

My impression is that both datetime and NA are releasable, but should be
labeled tech preview or something similar, because they may still see
significant changes. Please correct me if I'm wrong.

There's still some maintenance work to do and pull requests to merge, but a
beta release by Christmas should be feasible. What do you all think?

Cheers,
Ralf
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy 1.7.0 release?

2011-12-05 Thread Travis Oliphant
I like the idea.   Is there resolution to the NA question?

--
Travis Oliphant
(on a mobile)
512-826-7480


On Dec 5, 2011, at 2:43 PM, Ralf Gommers ralf.gomm...@googlemail.com wrote:

 Hi all,
 
 It's been a little over 6 months since the release of 1.6.0 and the NA debate 
 has quieted down, so I'd like to ask your opinion on the timing of 1.7.0. It 
 looks to me like we have a healthy amount of bug fixes and small 
 improvements, plus three larger chucks of work:
 
 - datetime
 - NA
 - Bento support
 
 My impression is that both datetime and NA are releasable, but should be 
 labeled tech preview or something similar, because they may still see 
 significant changes. Please correct me if I'm wrong. 
 
 There's still some maintenance work to do and pull requests to merge, but a 
 beta release by Christmas should be feasible. What do you all think?
 
 Cheers,
 Ralf
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NumPy Governance

2011-12-05 Thread Charles R Harris
On Mon, Dec 5, 2011 at 12:43 PM, Benjamin Root ben.r...@ou.edu wrote:

 On Mon, Dec 5, 2011 at 12:06 PM, Mark Wiebe mwwi...@gmail.com wrote:

 On Sat, Dec 3, 2011 at 6:18 PM, Travis Oliphant teoliph...@gmail.comwrote:


 Hi everyone,

 There have been some wonderfully vigorous discussions over the past few
 months that have made it clear that we need some clarity about how
 decisions will be made in the NumPy community.

 When we were a smaller bunch of people it seemed easier to come to an
 agreement and things pretty much evolved based on (mostly) consensus and
 who was available to actually do the work.

 There is a need for a more clear structure so that we know how decisions
 will get made and so that code can move forward while paying attention to
 the current user-base.   There has been a steering committee structure
 for SciPy in the past, and I have certainly been prone to lump both NumPy
 and SciPy together given that I have a strong interest in and have spent a
 great amount of time working on both projects.Others have also spent
 time on both projects.

 However, I think it is critical at this stage to clearly separate the
 projects and define a governing structure that is fair and agreeable for
 NumPy.   SciPy has multiple modules and will probably need structure around
 each module independently.For now, I wanted to open up a discussion to
 see what people thought about NumPy's governance.

 My initial thoughts:

* discussions happen as they do now on the mailing list
* a small group of developers (5-11) constitute the board and
 major decisions are made by vote of that group (not just simple majority
 --- needs at least 2/3 +1 votes).
* votes are +1/+0/-0/-1
* if a topic is difficult to resolve it is moved off the main
 list and discussed on a separate board mailing list --- these should be
 rare, but parts of the NA discussion would probably qualify
* This board mailing list is publically viewable but only board
 members may post.
* The board is renewed and adjusted each year --- based on
 nomination and 2/3 vote of the current board until board is at 11.
* The chairman of the board is voted by a majority of the board
 and has veto power unless over-ridden by 3/4 of the board.
* Petitions to remove people off the board can be made by 50+
 independent reverse nominations (hopefully people will just withdraw if
 they are no longer active).

 All of these points are open for discussion.  I just thought I would
 start the conversation.   I will be much more active this next year with
 NumPy and will be very interested in the direction NumPy is taking.I'm
 hoping to discern by this conversation, who else is very interested in the
 direction of NumPy so that the first board can be formally constituted.


 I'm definitely in support of something along these lines. My experience
 entering NumPy development was that the development process, coding
 standards, and other aspects of the process are not very well specified,
 and people have many differing ideas about what has already been agreed
 upon. I would recommend that fixing this state of affairs be placed high on
 the agenda of the board, with the goal of making it easier to attract new
 developers.

 A few people have proposed the BDFL approach, as in CPython development.
 In practice, I believe Guido has done very well in the role because he only
 uses the power as a last resort. Even if NumPy adopts a similar approach,
 having a board along the lines Travis proposes would still be a good thing,
 and having a BDFL would just mean that there's someone who could override
 the will of the board and make an entirely different choice.

 It may be worth considering how the governance structure is related to
 the different levels of the NumPy codebase. There is a (very) small group
 of people who have contributed significant amounts of C code, a larger
 group of people who have contributed significant amounts of Python code,
 many people who have contributed small C and/or Python patches, and a large
 number of people who contribute bug reports, email list comments, etc. It
 may be worth designing the board taking into account these different groups
 of developers and users.

 -Mark



 Best regards,

 -Travis


 Just some thoughts I have from this discussion.

 1. I think that we need to encourage and entice more NumPy
 developers/contributors.  Having a board of only a few core developers puts
 us right back in the same boat we were in during the whole NA discussion,
 only more codified.  Increasing the size of the board with more core
 developers would diversify thought and counter-act group-think.  I think
 that this problem needs to be solved before anything else.


Well, that's a tough one. Numpy development tends to attract folks with
spare time, i.e., students*, and those with an itch to scratch. Itched
scratched, degree obtained, they go back to their primary 

Re: [Numpy-discussion] numpy 1.7.0 release?

2011-12-05 Thread Ralf Gommers
On Mon, Dec 5, 2011 at 9:13 PM, Charles R Harris
charlesr.har...@gmail.comwrote:



 On Mon, Dec 5, 2011 at 1:08 PM, Travis Oliphant oliph...@enthought.comwrote:

 I like the idea.   Is there resolution to the NA question?


 No, people still disagree and are likely to do so for years to come with
 no end in sight. That's why the preview label.

 Agreed that it's not resolved, but I think we at least got to the point
where we agreed not to back out the complete missing data additions. So if
we clearly say that we keep all options for future API changes open
(=preview label), I don't think that the issue should hold up a numpy
release indefinitely.

Ralf
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] What does fftn take as parameters?

2011-12-05 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

(Note I'm a programmer type, not a math type and am doing coding directed
by a matlab user.)

I'm trying to do an fft on multiple columns of data at once (ultimately
feeding into a correlation calculation).  I can use fft() to work on one
column:

  data=[23, 43, 53, 54, 0, 10]
  powtwo=8 # nearest power of two size
  numpy.fft.fft(data, powtwo)

I want to do that but using fftn (the matlab user said it is the right
function) but I can't work out from the docs or experimentation how the
input data should be formatted.  eg is it row major or column major.  For
example the above could be:

  data=[ [23, 43, 53, 54, 0, 10] ]

 or

  data=[ [23], [43], [53], [54], [0], [10] ]

All the examples in the docs use square inputs (ie x and y axes are the
same length) so that doesn't help.  The documentation shows examples of
the output, but not the input.  I found code passing in a single int (not
a list of int) as the s parameter, but that also gives me an error.

Roger
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)

iEYEARECAAYFAk7dN2YACgkQmOOfHg372QQ4YQCg4sKmtx8UAoEOuosWzUofw/KZ
B5AAoKeHzP8HgpvDrXDANj0wqll5L9MO
=iRAX
-END PGP SIGNATURE-
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy 1.7.0 release

2011-12-05 Thread Tim Burgess

 
 On Mon, Dec 5, 2011 at 9:13 PM, Charles R Harris
 charlesr.har...@gmail.comwrote:
 
 
 
 On Mon, Dec 5, 2011 at 1:08 PM, Travis Oliphant 
 oliph...@enthought.comwrote:
 
 I like the idea.   Is there resolution to the NA question?
 
 
 No, people still disagree and are likely to do so for years to come with
 no end in sight. That's why the preview label.
 
 Agreed that it's not resolved, but I think we at least got to the point
 where we agreed not to back out the complete missing data additions. So if
 we clearly say that we keep all options for future API changes open
 (=preview label), I don't think that the issue should hold up a numpy
 release indefinitely.
 
 Ralf


I think a release is a good idea. In addition to the previous points mentioned,
having NA in as a preview in a 1.7.0 release will likely raise it's visibility
- a lot of people will read release notes of a newer version but won't ever
track discussions in a mailing list.



Tim Burgess

Software Engineer - Coral Reef Watch
Satellite Applications and Research - NESDIS
National Oceanic and Atmospheric Administration

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ignore NAN in numpy.true_divide()

2011-12-05 Thread questions anon
Maybe I am asking the wrong question or could go about this another way.
I have thousands of numpy arrays to flick through, could I just identify
which arrays have NAN's and for now ignore the entire array. is there a
simple way to do this?
any feedback will be greatly appreciated.

On Thu, Dec 1, 2011 at 12:16 PM, questions anon questions.a...@gmail.comwrote:

 I am trying to calculate the mean across many netcdf files. I cannot use
 numpy.mean because there are too many files to concatenate and I end up
 with a memory error. I have enabled the below code to do what I need but I
 have a few nan values in some of my arrays. Is there a way to ignore these
 somewhere in my code. I seem to face this problem often so I would love a
 command that ignores blanks in my array before I continue on to the next
 processing step.
 Any feedback is greatly appreciated.


 netCDF_list=[]
 for dir in glob.glob(MainFolder + '*/01/')+ glob.glob(MainFolder +
 '*/02/')+ glob.glob(MainFolder + '*/12/'):
 for ncfile in glob.glob(dir + '*.nc'):
 netCDF_list.append(ncfile)

 slice_counter=0
 print netCDF_list

 for filename in netCDF_list:
 ncfile=netCDF4.Dataset(filename)
 TSFC=ncfile.variables['T_SFC'][:]
 fillvalue=ncfile.variables['T_SFC']._FillValue
 TSFC=MA.masked_values(TSFC, fillvalue)
 for i in xrange(0,len(TSFC)-1,1):
 slice_counter +=1
 #print slice_counter
 try:
 running_sum=N.add(running_sum, TSFC[i])
 except NameError:
 print Initiating the running total of my
 variable...
 running_sum=N.array(TSFC[i])

 TSFC_avg=N.true_divide(running_sum, slice_counter)
 N.set_printoptions(threshold='nan')
 print the TSFC_avg is:, TSFC_avg


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ignore NAN in numpy.true_divide()

2011-12-05 Thread Xavier Barthelemy
Hi,
I don't know if it is the best choice, but this is what I do in my code:

for each slice:
  indexnonNaN=np.isfinite(SliceOf Toto)
  SliceOf TotoWithoutNan= SliceOf Toto [indexnonNaN]

and then perform all operation I want o on the last array.

i hope it does answer your question

Xavier


2011/12/6 questions anon questions.a...@gmail.com

 Maybe I am asking the wrong question or could go about this another way.
 I have thousands of numpy arrays to flick through, could I just identify
 which arrays have NAN's and for now ignore the entire array. is there a
 simple way to do this?
 any feedback will be greatly appreciated.

 On Thu, Dec 1, 2011 at 12:16 PM, questions anon 
 questions.a...@gmail.comwrote:

 I am trying to calculate the mean across many netcdf files. I cannot use
 numpy.mean because there are too many files to concatenate and I end up
 with a memory error. I have enabled the below code to do what I need but I
 have a few nan values in some of my arrays. Is there a way to ignore these
 somewhere in my code. I seem to face this problem often so I would love a
 command that ignores blanks in my array before I continue on to the next
 processing step.
 Any feedback is greatly appreciated.


 netCDF_list=[]
 for dir in glob.glob(MainFolder + '*/01/')+ glob.glob(MainFolder +
 '*/02/')+ glob.glob(MainFolder + '*/12/'):
 for ncfile in glob.glob(dir + '*.nc'):
 netCDF_list.append(ncfile)

 slice_counter=0
 print netCDF_list

 for filename in netCDF_list:
 ncfile=netCDF4.Dataset(filename)
 TSFC=ncfile.variables['T_SFC'][:]
 fillvalue=ncfile.variables['T_SFC']._FillValue
 TSFC=MA.masked_values(TSFC, fillvalue)
 for i in xrange(0,len(TSFC)-1,1):
 slice_counter +=1
 #print slice_counter
 try:
 running_sum=N.add(running_sum, TSFC[i])
 except NameError:
 print Initiating the running total of my
 variable...
 running_sum=N.array(TSFC[i])

 TSFC_avg=N.true_divide(running_sum, slice_counter)
 N.set_printoptions(threshold='nan')
 print the TSFC_avg is:, TSFC_avg



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion




-- 
 « Quand le gouvernement viole les droits du peuple, l'insurrection est,
pour le peuple et pour chaque portion du peuple, le plus sacré des droits
et le plus indispensable des devoirs »

Déclaration des droits de l'homme et du citoyen, article 35, 1793
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NumPy Governance

2011-12-05 Thread Stéfan van der Walt
On Mon, Dec 5, 2011 at 12:10 PM, Charles R Harris
charlesr.har...@gmail.com wrote:
 Well, that's a tough one. Numpy development tends to attract folks with
 spare time, i.e., students*, and those with an itch to scratch. Itched
 scratched, degree obtained, they go back to their primary interest or on to
 jobs and the rest of life.

NumPy does seem to be different in this regard, in that many of the
developers stick around (even if they're not active on the code any
longer), think about potential issues and new directions, take part in
discussions, teach at conferences, organise workshops, write, etc.

I agree with Matthew that using a board should be a last resort, and
mildly disagree with Perry that it would be better to have a single
person make the final call.  The advantage of a benevolent dictator is
that you have a coherent driving vision, but at the cost of
sacrificing community ownership.

As for barriers to entry, improving the the nature of discourse on the
mailing list (when it comes to thorny issues) would be good.
Technical barriers are not that hard to breach for our community;
setting the right social atmosphere is crucial.

Regards
Stéfan
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ignore NAN in numpy.true_divide()

2011-12-05 Thread questions anon
Thanks for responding. I have tried several ways of adding the command, one
of which is:

for i in TSFC:
if N.any(N.isnan(TSFC)):
break
else:
pass
but nothing is happening, is there some particular way I need to add this
command? I have posted all below:

netCDF_list=[]

for dir in glob.glob(MainFolder + '*/01/')+ glob.glob(MainFolder +
'*/02/')+ glob.glob(MainFolder + '*/12/'):
#print dir
for ncfile in glob.glob(dir + '*.nc'):
netCDF_list.append(ncfile)

slice_counter=0
print netCDF_list
for filename in netCDF_list:
ncfile=netCDF4.Dataset(filename)
TSFC=ncfile.variables['T_SFC'][:]
fillvalue=ncfile.variables['T_SFC']._FillValue
TSFC=MA.masked_values(TSFC, fillvalue)
for a in TSFC:
if N.any(N.isnan(TSFC)):
break
else:
pass

for i in xrange(0,len(TSFC)-1,1):
slice_counter +=1
#print slice_counter
try:
running_sum=N.add(running_sum, TSFC[i])
except NameError:
print Initiating the running total of my
variable...
running_sum=N.array(TSFC[i])

TSFC_avg=N.true_divide(running_sum, slice_counter)
N.set_printoptions(threshold='nan')
print the TSFC_avg is:, TSFC_avg




On Tue, Dec 6, 2011 at 9:45 AM, David Cournapeau courn...@gmail.com wrote:

 On Mon, Dec 5, 2011 at 5:29 PM, questions anon questions.a...@gmail.com
 wrote:
  Maybe I am asking the wrong question or could go about this another way.
  I have thousands of numpy arrays to flick through, could I just identify
  which arrays have NAN's and for now ignore the entire array. is there a
  simple way to do this?

 Doing np.any(np.isnan(a)) for an array a should answer this exact question

 David
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ignore NAN in numpy.true_divide()

2011-12-05 Thread Xavier Barthelemy
Well, I would see  solutions:
1- to keep how your code is, withj a python list (you can stack numpy
arrays if they have the same dimensions):

for filename in netCDF_list:
ncfile=netCDF4.Dataset(filename)
TSFC=ncfile.variables['T_SFC'][:]
fillvalue=ncfile.variables['T_SFC']._FillValue
TSFC=MA.masked_values(TSFC, fillvalue)
TSFCWithOutNan=[]
for a in TSFC:
indexnonNaN=N.isfinite(a)
SliceofTotoWithoutNan=a[indexnonNaN]
print SliceofTotoWithoutNan
TSFCWithOutNan .append( SliceofTotoWithoutNan )



for i in xrange(0,len(TSFCWithOutNan  )-1,1):
slice_counter +=1
#print slice_counter
try:
running_sum=N.add(running_sum,
TSFCWithOutNan  [i])
except NameError:
print Initiating the running total of my
variable...
running_sum=N.array(TSFCWithOutNan  [i])
...

or 2- everything in the same loop:

slice_counter  =0
for a in TSFC:
indexnonNaN=N.isfinite(a)
SliceofTotoWithoutNan=a[indexnonNaN]
slice_counter +=1
#print slice_counter
try:
running_sum=N.add(running_sum,
SliceofTotoWithoutNan )
except NameError:
print Initiating the running total of my
variable...
running_sum=N.array( SliceofTotoWithoutNan
)
TSFC_avg=N.true_divide(running_sum, slice_counter)
N.set_printoptions(threshold='nan')
print the TSFC_avg is:, TSFC_avg

See if it works. it is just a rapid guess
Xavier

for dir in glob.glob(MainFolder + '*/01/')+ glob.glob(MainFolder +
'*/02/')+ glob.glob(MainFolder + '*/12/'):

 #print dir

 for ncfile in glob.glob(dir + '*.nc'):
 netCDF_list.append(ncfile)

 slice_counter=0
 print netCDF_list
 for filename in netCDF_list:
 ncfile=netCDF4.Dataset(filename)
 TSFC=ncfile.variables['T_SFC'][:]
 fillvalue=ncfile.variables['T_SFC']._FillValue
 TSFC=MA.masked_values(TSFC, fillvalue)
 for a in TSFC:
 indexnonNaN=N.isfinite(a)
 SliceofTotoWithoutNan=a[indexnonNaN]
 print SliceofTotoWithoutNan
 TSFC=SliceofTotoWithoutNan


 for i in xrange(0,len(TSFC)-1,1):
 slice_counter +=1
 #print slice_counter
 try:
 running_sum=N.add(running_sum, TSFC[i])
 except NameError:
 print Initiating the running total of my
 variable...
 running_sum=N.array(TSFC[i])

 TSFC_avg=N.true_divide(running_sum, slice_counter)
 N.set_printoptions(threshold='nan')
 print the TSFC_avg is:, TSFC_avg




 On Tue, Dec 6, 2011 at 9:50 AM, Xavier Barthelemy xab...@gmail.comwrote:

 Hi,
 I don't know if it is the best choice, but this is what I do in my code:

 for each slice:
   indexnonNaN=np.isfinite(SliceOf Toto)
   SliceOf TotoWithoutNan= SliceOf Toto [indexnonNaN]

 and then perform all operation I want o on the last array.

 i hope it does answer your question

 Xavier


 2011/12/6 questions anon questions.a...@gmail.com

  Maybe I am asking the wrong question or could go about this another way.
 I have thousands of numpy arrays to flick through, could I just identify
 which arrays have NAN's and for now ignore the entire array. is there a
 simple way to do this?
 any feedback will be greatly appreciated.

 On Thu, Dec 1, 2011 at 12:16 PM, questions anon 
 questions.a...@gmail.com wrote:

 I am trying to calculate the mean across many netcdf files. I cannot
 use numpy.mean because there are too many files to concatenate and I end up
 with a memory error. I have enabled the below code to do what I need but I
 have a few nan values in some of my arrays. Is there a way to ignore these
 somewhere in my code. I seem to face this problem often so I would love a
 command that ignores blanks in my array before I continue on to the next
 processing step.
 Any feedback is greatly appreciated.


 netCDF_list=[]
 for dir in glob.glob(MainFolder + '*/01/')+ glob.glob(MainFolder +
 '*/02/')+ glob.glob(MainFolder + '*/12/'):
 for ncfile in glob.glob(dir + '*.nc'):
 netCDF_list.append(ncfile)

 slice_counter=0
 print netCDF_list

 for filename in netCDF_list:
 ncfile=netCDF4.Dataset(filename)
 TSFC=ncfile.variables['T_SFC'][:]
 fillvalue=ncfile.variables['T_SFC']._FillValue
 TSFC=MA.masked_values(TSFC, fillvalue)
 for i in xrange(0,len(TSFC)-1,1):
 slice_counter +=1
 #print slice_counter
 try:
 

Re: [Numpy-discussion] ignore NAN in numpy.true_divide()

2011-12-05 Thread questions anon
thanks again for you response. I must still be doing something wrong!!
both options resulted in :
the TSFC_avg is: [-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
-- -- -- -- -- --
 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --

1st option:

slice_counter=0

for filename in netCDF_list:
ncfile=netCDF4.Dataset(filename)
TSFC=ncfile.variables['T_SFC'][:]
fillvalue=ncfile.variables['T_SFC']._FillValue
TSFC=MA.masked_values(TSFC, fillvalue)
TSFCWithOutNan=[]
for a in TSFC:
indexnonNaN=N.isfinite(a)
SliceofTotoWithoutNan=a[indexnonNaN]
print SliceofTotoWithoutNan
TSFCWithOutNan.append(SliceofTotoWithoutNan)
for i in xrange(0,len(TSFCWithOutNan)-1,1):
slice_counter +=1
try:
running_sum=N.add(running_sum, TSFCWithOutNan[i])
except NameError:
print Initiating the running total of my
variable...
running_sum=N.array(TSFCWithOutNan[i])

TSFC_avg=N.true_divide(running_sum, slice_counter)
N.set_printoptions(threshold='nan')
print the TSFC_avg is:, TSFC_avg



the 2nd option :

for filename in netCDF_list:
ncfile=netCDF4.Dataset(filename)
TSFC=ncfile.variables['T_SFC'][:]
fillvalue=ncfile.variables['T_SFC']._FillValue
TSFC=MA.masked_values(TSFC, fillvalue)

slice_counter=0
for a in TSFC:
indexnonNaN=N.isfinite(a)
SliceofTotoWithoutNan=a[indexnonNaN]
slice_counter +=1
try:
running_sum=N.add(running_sum,
SliceofTotoWithoutNan)
except NameError:
 print Initiating the running total of my
variable...
 running_sum=N.array(SliceofTotoWithoutNan)

TSFC_avg=N.true_divide(running_sum, slice_counter)
N.set_printoptions(threshold='nan')
print the TSFC_avg is:, TSFC_avg





On Tue, Dec 6, 2011 at 2:31 PM, Xavier Barthelemy xab...@gmail.com wrote:

 Well, I would see  solutions:
 1- to keep how your code is, withj a python list (you can stack numpy
 arrays if they have the same dimensions):

 for filename in netCDF_list:
 ncfile=netCDF4.Dataset(filename)
 TSFC=ncfile.variables['T_SFC'][:]
 fillvalue=ncfile.variables['T_SFC']._FillValue
 TSFC=MA.masked_values(TSFC, fillvalue)
 TSFCWithOutNan=[]
 for a in TSFC:
 indexnonNaN=N.isfinite(a)
 SliceofTotoWithoutNan=a[indexnonNaN]
 print SliceofTotoWithoutNan
 TSFCWithOutNan .append( SliceofTotoWithoutNan )



 for i in xrange(0,len(TSFCWithOutNan  )-1,1):

 slice_counter +=1
 #print slice_counter
 try:
 running_sum=N.add(running_sum,
 TSFCWithOutNan  [i])

 except NameError:
 print Initiating the running total of my
 variable...
 running_sum=N.array(TSFCWithOutNan  [i])
 ...

 or 2- everything in the same loop:

 slice_counter  =0
 for a in TSFC:
 indexnonNaN=N.isfinite(a)
 SliceofTotoWithoutNan=a[indexnonNaN]
 slice_counter +=1
 #print slice_counter
 try:
 running_sum=N.add(running_sum,
 SliceofTotoWithoutNan )

 except NameError:
 print Initiating the running total of my
 variable...
 running_sum=N.array( SliceofTotoWithoutNan
 )
 TSFC_avg=N.true_divide(running_sum, slice_counter)
 N.set_printoptions(threshold='nan')
 print the TSFC_avg is:, TSFC_avg

 See if it works. it is just a rapid guess
 Xavier


 for dir in glob.glob(MainFolder + '*/01/')+ glob.glob(MainFolder +
 '*/02/')+ glob.glob(MainFolder + '*/12/'):

 #print dir

 for ncfile in glob.glob(dir + '*.nc'):
 netCDF_list.append(ncfile)

 slice_counter=0
 print netCDF_list
 for filename in netCDF_list:
 ncfile=netCDF4.Dataset(filename)
 TSFC=ncfile.variables['T_SFC'][:]
 fillvalue=ncfile.variables['T_SFC']._FillValue
 TSFC=MA.masked_values(TSFC, fillvalue)
 for a in TSFC:
 indexnonNaN=N.isfinite(a)
 SliceofTotoWithoutNan=a[indexnonNaN]
 print SliceofTotoWithoutNan
 TSFC=SliceofTotoWithoutNan


 for i in xrange(0,len(TSFC)-1,1):
  

Re: [Numpy-discussion] Apparently non-deterministic behaviour of complex array multiplication

2011-12-05 Thread kneil

Hi Nathaniel, 
Thanks for the suggestion.  I more or less implemented it:

np.save('X',X);
X2=np.load('X.npy')
X2=np.asmatrix(X2)
diffy = (X != X2)
if diffy.any():
print X[diffy]
print X2[diffy]
print X[diffy][0].view(np.uint8)
print X2[diffy][0].view(np.uint8)
S=X*X.H/k
S2=X2*X2.H/k  

nanElts=find(isnan(S))
if len(nanElts)!=0:  
print 'WARNING: Nans in S:'+str(find(isnan(S)))
print 'WARNING: Nans in S2:'+str(find(isnan(S2)))



My ouput, (when I got NaN) mostly indicated that both arrays are numerically
identical, and that they evaluated to have the same nan-value entries.  

For example
WARNING: Nans in S:[ 6 16]
WARNING: Nans in S2:[ 6 16]

Another time I got as output:

WARNING: Nans in S:[ 26  36  46  54  64  72  82  92 100 110 128 138 146
156 166 174 184 192
 202 212 220 230 240 250 260 268 278 279 296 297 306 314 324 334 335 342
 352 360 370 380 388 398 416 426 434 444 454 464 474]
WARNING: Nans in S2:[ 26  36  46  54  64  72  82  92 100 110 128 138 146
156 166 174 184 192
 202 212 220 230 240 250 260 268 278 279 296 297 306 314 324 334 335 342
 352 360 370 380 388 398 416 426 434 444 454 464 474]

These were different arrays I think.  At anyrate, those two results appeared
from two runs of the exact same code.  I do not use any random numbers in
the code by the way.  Most of the time the code runs without any nan showing
up at all, so this is an improvement. 

*I am pretty sure that one time there were nan in S, but not in S2, yet
still no difference was observed in the two matrices X and X2.  But, I did
not save that output, so I can't prove it to myself, ... but I am pretty
sure I saw that.

I will try and run memtest tonight.  I am going out of town for a week and
probably wont be able to test until next week.
cheers, 
Karl

I also think What was beyond w:
1. I have many less NaN than I used to, but still get NaN in S, 
but NOT in S2!



Nathaniel Smith wrote:
 
 If save/load actually makes a reliable difference, then it would be useful
 to do something like this, and see what you see:
 
 save(X, X)
 X2 = load(X.npy)
 diff = (X == X2)
 # did save/load change anything?
 any(diff)
 # if so, then what changed?
 X[diff]
 X2[diff]
 # any subtle differences in floating point representation?
 X[diff][0].view(np.uint8)
 X2[diff][0].view(np.uint8)
 
 (You should still run memtest. It's very easy - just install it with your
 package manager, then reboot. Hold down the shift key while booting, and
 you'll get a boot menu. Choose memtest, and then leave it to run
 overnight.)
 
 - Nathaniel
 On Dec 2, 2011 10:10 PM, kneil magnetotellur...@gmail.com wrote:
 
 

-- 
View this message in context: 
http://old.nabble.com/Apparently-non-deterministic-behaviour-of-complex-array-multiplication-tp32893004p32922174.html
Sent from the Numpy-discussion mailing list archive at Nabble.com.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NumPy Governance

2011-12-05 Thread Matthew Brett
Hi,

2011/12/5 Stéfan van der Walt ste...@sun.ac.za:
 As for barriers to entry, improving the the nature of discourse on the
 mailing list (when it comes to thorny issues) would be good.
 Technical barriers are not that hard to breach for our community;
 setting the right social atmosphere is crucial.

I'm just about to get on a plane and am going to be out of internet
range for a while, so, in the spirit of constructive discussion:

In the spirit of use-cases:

Would it be fair to say that the two contentious recent discussions have been:

The numpy ABI breakage, 2.0 vs 1.5.1 discussion
The masked array discussion(s) ?

What did we do wrong or right in each of these two discussions?  What
could we have done better?  What process would help us to do better?

Travis - for your board-only-post mailing list - my feeling is that
this is going in the wrong direction.  The effect of the board-only
mailing list is to explicitly remove non-qualified people from the
discussion.   This will make it more explicit that the substantial
decisions will be make by a few important people.   Do you (Travis -
or Mark?) think that, if this had happened earlier in the masked array
discussion, it would have been less contentious, or had more
substantial content?  My instinct would be the reverse, and the best
solution would have been to pause and commit to beating out the issues
and getting agreement.

See you,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] idea of optimisation?

2011-12-05 Thread Xavier Barthelemy
Hi everyone

I was wondering if there is a more optimal way to write what follows:
I am studying waves, so I have an array of wave crests positions, Xcrest
and the positions of the ZeroCrossings, Xzeros.

The goal is to find between which Xzeros my xcrest are.


XXX1=XCrest
CrestZerosNeighbour=np.zeros([len(XCrest),2], dtype='d')
for nn in range(len(Xzeros)-1):
X1=Xzeros[nn]
X2=Xzeros[nn+1]
indexxx1=np.where((X1=XXX1)  (XXX1  X2))
try:
  CrestZerosNeighbour[indexxx1[0]]=np.array([X1,X2])
except:
  pass

Someone has an idea? in the spirit of (numpy.ma.masked_outside) which does
exactly the opposite I want: it masks an array outside an interval. I would
like to mask everything except the interval that contains my value.
I do this operation a large number of times , and a loop is time consuming.

thanks
Xavier


-- 
 « Quand le gouvernement viole les droits du peuple, l'insurrection est,
pour le peuple et pour chaque portion du peuple, le plus sacré des droits
et le plus indispensable des devoirs »

Déclaration des droits de l'homme et du citoyen, article 35, 1793
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] idea of optimisation?

2011-12-05 Thread David Froger
Excerpts from Xavier Barthelemy's message of mar. déc. 06 06:53:09 +0100 2011:
 Hi everyone
 
 I was wondering if there is a more optimal way to write what follows:
 I am studying waves, so I have an array of wave crests positions, Xcrest
 and the positions of the ZeroCrossings, Xzeros.
 
 The goal is to find between which Xzeros my xcrest are.
 
 
 XXX1=XCrest
 CrestZerosNeighbour=np.zeros([len(XCrest),2], dtype='d')
 for nn in range(len(Xzeros)-1):
 X1=Xzeros[nn]
 X2=Xzeros[nn+1]
 indexxx1=np.where((X1=XXX1)  (XXX1  X2))
 try:
   CrestZerosNeighbour[indexxx1[0]]=np.array([X1,X2])
 except:
   pass
 
 Someone has an idea? in the spirit of (numpy.ma.masked_outside) which does
 exactly the opposite I want: it masks an array outside an interval. I would
 like to mask everything except the interval that contains my value.
 I do this operation a large number of times , and a loop is time consuming.

Hi,

My first idea  would be to write a  function in C or Fortran  that return Xzeros
index (instead of values).  Algorithms may be optimized according to the inputs:
if  the  XCrest  are  Xzeros  sorted,   if  len(Xcreast)len(Xzeros)  using
dichotomy...  But I would be interested to see a solution with masked array too.

-- 
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] What does fftn take as parameters?

2011-12-05 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 05/12/11 14:19, David Cournapeau wrote:
 I am not I understand what you are trying to do ?

I had a slight misunderstanding with the math guy and had believed that
for our purposes we could feed in 16 columns and get one column of fft
output. However we do actually need 16 columns of output, each
corresponding to a column of input.

It seems he is obsessed with optimisation and apparently when calculating
an fft of a known size it would save some redundant calculations operating
on all 16 columns at once rather than doing them one at a time. That is
what he assumed fftn did from the description.

 numpy.fft.fft will compute fft on every *row*, or every column if you
 say pass axis=0 argument:

Note that I am using regular Python lists (they were JSON at one point)
and the fft documentation is incomprehensible to someone who hasn't used
numpy before and only cares about fft (there are a lot of matches for
Google searches about fft and python pointing to numpy).

The doc doesn't actually say what axis is and doesn't have an example.
Additionally a shape attribute is used which is peculiar to whatever
numpy uses as its data representation.

Roger
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)

iEYEARECAAYFAk7dyRwACgkQmOOfHg372QToxgCfR7IoUfgGQVZEEiElnjbtx7yx
R8EAnRfDg4y7AfFeSA8sQxVCq6ucgRG1
=gg2h
-END PGP SIGNATURE-
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] idea of optimisation?

2011-12-05 Thread Xavier Barthelemy
ok let me be more precise

I have an Z array which is the elevation
from this I extract a discrete array of Zero Crossing, and another discrete
array of Crests.
len(crest) is different than len(Xzeros). I have a threshold method to
detect my valid crests, and sometimes there are 2 crests between two
zero-crossing (grouping effect)

Crest and Zeros are 2 different arrays, with positions. example:
Zeros=[1,2,3,4] Arrays=[1.5,1.7,3.5]


and yes arrays can be sorted. not a problm with this.

Xavier

2011/12/6 David Froger david.fro...@gmail.com

 Excerpts from Xavier Barthelemy's message of mar. déc. 06 06:53:09 +0100
 2011:
  Hi everyone
 
  I was wondering if there is a more optimal way to write what follows:
  I am studying waves, so I have an array of wave crests positions, Xcrest
  and the positions of the ZeroCrossings, Xzeros.
 
  The goal is to find between which Xzeros my xcrest are.
 
 
  XXX1=XCrest
  CrestZerosNeighbour=np.zeros([len(XCrest),2], dtype='d')
  for nn in range(len(Xzeros)-1):
  X1=Xzeros[nn]
  X2=Xzeros[nn+1]
  indexxx1=np.where((X1=XXX1)  (XXX1  X2))
  try:
CrestZerosNeighbour[indexxx1[0]]=np.array([X1,X2])
  except:
pass
 
  Someone has an idea? in the spirit of (numpy.ma.masked_outside) which
 does
  exactly the opposite I want: it masks an array outside an interval. I
 would
  like to mask everything except the interval that contains my value.
  I do this operation a large number of times , and a loop is time
 consuming.

 Hi,

 My first idea  would be to write a  function in C or Fortran  that return
 Xzeros
 index (instead of values).  Algorithms may be optimized according to the
 inputs:
 if  the  XCrest  are  Xzeros  sorted,   if  len(Xcreast)len(Xzeros)
  using
 dichotomy...  But I would be interested to see a solution with masked
 array too.

 --
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion




-- 
 « Quand le gouvernement viole les droits du peuple, l'insurrection est,
pour le peuple et pour chaque portion du peuple, le plus sacré des droits
et le plus indispensable des devoirs »

Déclaration des droits de l'homme et du citoyen, article 35, 1793
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion