Re: [Numpy-discussion] numpy.array() of mixed integers and strings can truncate data
On Fri, Dec 2, 2011 at 18:53, Charles R Harris charlesr.har...@gmail.com wrote: After sleeping on this, I think an object array in this situation would be the better choice and wouldn't result in lost information. This might change the behavior of some functions though, so would need testing. I tried to come up with a simple patch to achieve this, but I think this is beyond me, particularly since I think something different has to happen for these cases: np.array([1234, 'ab']) np.array([1234]).astype('|S2') I tried a few things (changing the rules in PyArray_PromoteTypes(), other places), but I think I'm more likely to break some corner case than fix this cleanly. I filed a ticket (#1990) and a pull request to add a test to the 1.6.x maintenance branch, for someone more knowledgeable than me to address. I tried to write the test so that either choosing dtype=object or dtype=string of the required length would both pass. Ray Jones ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy Governance
I'm not sure I'm crazy about leaving final decision making for a board. A board may be a good way of carefully considering the issues, and it could make it's own recommendation (with a sufficient majority). But in the end I think one person needs to decide (and that decision may go against the board consensus, presumably only rarely). Why shouldn't that person be you? Perry On Dec 4, 2011, at 11:32 PM, Travis Oliphant wrote: Great points. My initial suggestion of 5-11 was more about current board size rather than trying to fix it. I agree that having someone represent from major downstream projects would be a great thing. -Travis On Dec 4, 2011, at 7:16 AM, Alan G Isaac wrote: On 12/4/2011 1:43 AM, Charles R Harris wrote: I don't think there are 5 active developers, let alone 11. With hard work you might scrape together two or three. Having 5 or 11 people making decisions for the two or three actually doing the work isn't going to go over well. Very true! But you might consider including on any board a developer or two from important projects that are very NumPy dependent. (E.g., Matplotlib.) One other thing: how about starting with a board of 3 and a rule that says any active developer can request to join, that additions are determined by majority vote of the existing board, and that having the board both small and odd numbered is a priority? (Fixing the board size in advance for a project we all hope will grow substantially seems odd.) fwiw, Alan Isaac ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion --- Travis Oliphant Enthought, Inc. oliph...@enthought.com 1-512-536-1057 http://www.enthought.com ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] failure to register ufunc loops for user defined types
Hi Geoffrey, On Mon, Dec 5, 2011 at 12:37 AM, Geoffrey Irving irv...@naml.us wrote: On Sun, Dec 4, 2011 at 6:45 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Sun, Dec 4, 2011 at 6:59 PM, Geoffrey Irving irv...@naml.us wrote: On Sun, Dec 4, 2011 at 5:18 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Sun, Dec 4, 2011 at 5:41 PM, Geoffrey Irving irv...@naml.us wrote: This may be the problem. Simple diffs are pleasant. I'm guessing this code doesn't get a lot of testing. Glad it's there, though! Geoffrey diff --git a/numpy/core/src/umath/ufunc_type_resolution.c b/numpy/core/src/umath/ufunc_type_resolution.c index 0d6cf19..a93eda1 100644 --- a/numpy/core/src/umath/ufunc_type_resolution.c +++ b/numpy/core/src/umath/ufunc_type_resolution.c @@ -1866,7 +1866,7 @@ linear_search_type_resolver(PyUFuncObject *self, case -1: return -1; /* A loop was found */ -case 1: +case 0: return 0; } } Heh. Can you verify that this fixes the problem? That function is only called once and its return value is passed up the chain, but the documented return values of that calling function are -1, 0. So the documentation needs to be changed if this is the right thing to do. Actually, that patch was wrong, since linear_search_userloop_type_resolver needs to return three values (error, not-found, success). A better patch follows. I can confirm that this gets me further, but I get other failures down the line, so more fixes may follow. I'll push the branch with all my fixes for convenience once I have everything working. Speaking of tests... I was wondering if you could be talked into putting together a simple user type for including in the tests? Yep, though likely not for a couple weeks. If there's interest, I could also be convinced to sanitize my entire rational class so you could include that directly. Currently it's both C++ and uses some gcc specific features like __int128_t. Basically it's numerator/denominator, where both are 64 bit integers, and an OverflowError is thrown if anything can't be represented as such (possibly a different exception would be better in cases like (164)/((164)+1)). It would be easy to generalize it to rational32 vs. rational64 as well. If you want tests but not rational, it would be straightforward to strip what I have down to a bare bones test case. We'll see how much interest there is. If it becomes official you may get more feedback on features. There are some advantages to having some user types in numpy. One is that otherwise they tend to get lost, another is that having a working example or two provides a templates for others to work from, and finally they provide test material. Because official user types aren't assigned anywhere there might also be some conflicts. Maybe something like an extension types module would be a way around that. In any case, I think both rational numbers and quaternions would be useful to have and I hope there is some discussion of how to do that. Rationals may be a bit trickier than quaternions though, as usually they are used to provide exact arithmetic without concern for precision. I don't know how restrictive the 64 bit limitation will be in practice. What are you using them for? I'm using them for frivolous analysis of poker Nash equilibria. I'll let others decide if it has any non-toy uses. 64 bits seems to be enough for me, though it's possible that I'll run in trouble with other examples. It still exact, though, in the sense that it throws an exception rather than doing anything weird if it overflows. And it has the key advantage of being orders of magnitude faster than object arrays of Fractions. Back to the bugs: here's a branch with all the changes I needed to get rational arithmetic to work: https://github.com/girving/numpy I discovered two more after the last email. One is another simple 0 vs. 1 bug, and another is somewhat optional: commit 730b05a892371d6f18d9317e5ae6dc306c0211b0 Author: Geoffrey Irving irv...@naml.us Date: Sun Dec 4 20:03:46 2011 -0800 After loops, check for PyErr_Occurred() even if needs_api is 0 For certain types of user defined classes, casting and ufunc loops normally run without the Python API, but occasionally need to throw an error. Currently we assume that !needs_api means no error occur. However, the fastest way to implement such loops is to run without the GIL normally and use PyGILState_Ensure/Release if an error occurs. In order to support this usage pattern, change all post-loop checks from needs_api PyErr_Occurred() to simply PyErr_Occurred() Thanks. Could you put this work into a separate branch, say fixuserloops, and enter a
Re: [Numpy-discussion] failure to register ufunc loops for user defined types
On Mon, Dec 5, 2011 at 6:59 AM, Charles R Harris charlesr.har...@gmail.com wrote: Hi Geoffrey, On Mon, Dec 5, 2011 at 12:37 AM, Geoffrey Irving irv...@naml.us wrote: On Sun, Dec 4, 2011 at 6:45 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Sun, Dec 4, 2011 at 6:59 PM, Geoffrey Irving irv...@naml.us wrote: On Sun, Dec 4, 2011 at 5:18 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Sun, Dec 4, 2011 at 5:41 PM, Geoffrey Irving irv...@naml.us wrote: This may be the problem. Simple diffs are pleasant. I'm guessing this code doesn't get a lot of testing. Glad it's there, though! Geoffrey diff --git a/numpy/core/src/umath/ufunc_type_resolution.c b/numpy/core/src/umath/ufunc_type_resolution.c index 0d6cf19..a93eda1 100644 --- a/numpy/core/src/umath/ufunc_type_resolution.c +++ b/numpy/core/src/umath/ufunc_type_resolution.c @@ -1866,7 +1866,7 @@ linear_search_type_resolver(PyUFuncObject *self, case -1: return -1; /* A loop was found */ - case 1: + case 0: return 0; } } Heh. Can you verify that this fixes the problem? That function is only called once and its return value is passed up the chain, but the documented return values of that calling function are -1, 0. So the documentation needs to be changed if this is the right thing to do. Actually, that patch was wrong, since linear_search_userloop_type_resolver needs to return three values (error, not-found, success). A better patch follows. I can confirm that this gets me further, but I get other failures down the line, so more fixes may follow. I'll push the branch with all my fixes for convenience once I have everything working. Speaking of tests... I was wondering if you could be talked into putting together a simple user type for including in the tests? Yep, though likely not for a couple weeks. If there's interest, I could also be convinced to sanitize my entire rational class so you could include that directly. Currently it's both C++ and uses some gcc specific features like __int128_t. Basically it's numerator/denominator, where both are 64 bit integers, and an OverflowError is thrown if anything can't be represented as such (possibly a different exception would be better in cases like (164)/((164)+1)). It would be easy to generalize it to rational32 vs. rational64 as well. If you want tests but not rational, it would be straightforward to strip what I have down to a bare bones test case. We'll see how much interest there is. If it becomes official you may get more feedback on features. There are some advantages to having some user types in numpy. One is that otherwise they tend to get lost, another is that having a working example or two provides a templates for others to work from, and finally they provide test material. Because official user types aren't assigned anywhere there might also be some conflicts. Maybe something like an extension types module would be a way around that. In any case, I think both rational numbers and quaternions would be useful to have and I hope there is some discussion of how to do that. Rationals may be a bit trickier than quaternions though, as usually they are used to provide exact arithmetic without concern for precision. I don't know how restrictive the 64 bit limitation will be in practice. What are you using them for? I'm using them for frivolous analysis of poker Nash equilibria. I'll let others decide if it has any non-toy uses. 64 bits seems to be enough for me, though it's possible that I'll run in trouble with other examples. It still exact, though, in the sense that it throws an exception rather than doing anything weird if it overflows. And it has the key advantage of being orders of magnitude faster than object arrays of Fractions. Back to the bugs: here's a branch with all the changes I needed to get rational arithmetic to work: https://github.com/girving/numpy I discovered two more after the last email. One is another simple 0 vs. 1 bug, and another is somewhat optional: commit 730b05a892371d6f18d9317e5ae6dc306c0211b0 Author: Geoffrey Irving irv...@naml.us Date: Sun Dec 4 20:03:46 2011 -0800 After loops, check for PyErr_Occurred() even if needs_api is 0 For certain types of user defined classes, casting and ufunc loops normally run without the Python API, but occasionally need to throw an error. Currently we assume that !needs_api means no error occur. However, the fastest way to implement such loops is to run without the GIL normally and use PyGILState_Ensure/Release if an error occurs. In order to support this usage pattern, change all post-loop checks from needs_api PyErr_Occurred() to simply
Re: [Numpy-discussion] failure to register ufunc loops for user defined types
On Sun, Dec 4, 2011 at 9:45 PM, Charles R Harris charlesr.har...@gmail.com wrote: We'll see how much interest there is. If it becomes official you may get more feedback on features. There are some advantages to having some user types in numpy. One is that otherwise they tend to get lost, another is that having a working example or two provides a templates for others to work from, and finally they provide test material. Because official user types aren't assigned anywhere there might also be some conflicts. Maybe something like an extension types module would be a way around that. In any case, I think both rational numbers and quaternions would be useful to have and I hope there is some discussion of how to do that. I agree that those will be useful, but I am worried about adding more stuff in multiarray. User-types should really be separated from multiarray. Ideally, they should be plugins but separated from multiarray would be a good first step. I realize it is a bit unfair to have this ready for Geoffray's code changes, but depending on the timelines for the 2.0.0 milestone, I think this would be a useful thing to have. Otherwise, if some ABI/API changes are needed after 2.0, we will be dragged down with this for years. I am willing to spend time on this. Geoffray, does this sound acceptable to you ? David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] failure to register ufunc loops for user defined types
On Sun, Dec 4, 2011 at 11:37 PM, Geoffrey Irving irv...@naml.us wrote: snip Back to the bugs: here's a branch with all the changes I needed to get rational arithmetic to work: https://github.com/girving/numpy I discovered two more after the last email. One is another simple 0 vs. 1 bug, and another is somewhat optional: commit 730b05a892371d6f18d9317e5ae6dc306c0211b0 Author: Geoffrey Irving irv...@naml.us Date: Sun Dec 4 20:03:46 2011 -0800 After loops, check for PyErr_Occurred() even if needs_api is 0 For certain types of user defined classes, casting and ufunc loops normally run without the Python API, but occasionally need to throw an error. Currently we assume that !needs_api means no error occur. However, the fastest way to implement such loops is to run without the GIL normally and use PyGILState_Ensure/Release if an error occurs. In order to support this usage pattern, change all post-loop checks from needs_api PyErr_Occurred() to simply PyErr_Occurred() To support this properly, I think we would need to convert needs_api into an enum with this hybrid mode as another case. While it isn't done currently, I was imagining using a thread pool to multithread the trivially data-parallel operations when needs_api is false, and I suspect the PyGILState_Ensure/Release would trigger undefined behavior in a thread created entirely outside of the Python system. For comparison, I created a special mechanism for simplified multi-threaded exceptions in the nditer in the 'errmsg' parameter: http://docs.scipy.org/doc/numpy/reference/c-api.iterator.html#NpyIter_GetIterNext Worth considering is also the fact that the PyGILState API is incompatible with multiple embedded interpreters. Maybe that's not something anyone does with NumPy, though. -Mark Geoffrey ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy Governance
On 12/05/2011 06:22 AM, Perry Greenfield wrote: I'm not sure I'm crazy about leaving final decision making for a board. A board may be a good way of carefully considering the issues, and it could make it's own recommendation (with a sufficient majority). But in the end I think one person needs to decide (and that decision may go against the board consensus, presumably only rarely). Why shouldn't that person be you? Perry I have similar thoughts because I just do not see how a board would work especially given that anyone can be a 'core developer' because the distributed aspect removes that 'entry barrier'. I also think that there needs to be something formal like Linux Kernel Summit (see the excellent coverage by LWN.net; http://lwn.net/Articles/KernelSummit2011/). I know that people get together to talk at meetings or via invitation (http://blog.fperez.org/2011/05/austin-trip-ipython-at-tacc-and.html). This would provide a good opportunity to hash out concerns, introduce new features and identify community needs that cannot be adequately addressed via electronic communication. The datarray is a 'good' example of how this could work except that it has not been pushed upstream yet! (It would be a excellent example if it had been pushed upstream :-) hint, hint.) I also must disagree with statement of Travis that discussions happen as they do now on the mailing list. This is simply not true because the mailing lists, tickets and pull requests are not connected so these have their own discussion threads. Sure there are some nice examples, Mark did tell us about this NA branch but the actual merge was still a surprise. So I think better communication of these such as emailing the list with a set 'public comment period' before requests are merged (longer periods for major changes). Bruce On Dec 4, 2011, at 11:32 PM, Travis Oliphant wrote: Great points. My initial suggestion of 5-11 was more about current board size rather than trying to fix it. I agree that having someone represent from major downstream projects would be a great thing. -Travis On Dec 4, 2011, at 7:16 AM, Alan G Isaac wrote: On 12/4/2011 1:43 AM, Charles R Harris wrote: I don't think there are 5 active developers, let alone 11. With hard work you might scrape together two or three. Having 5 or 11 people making decisions for the two or three actually doing the work isn't going to go over well. Very true! But you might consider including on any board a developer or two from important projects that are very NumPy dependent. (E.g., Matplotlib.) One other thing: how about starting with a board of 3 and a rule that says any active developer can request to join, that additions are determined by majority vote of the existing board, and that having the board both small and odd numbered is a priority? (Fixing the board size in advance for a project we all hope will grow substantially seems odd.) fwiw, Alan Isaac ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion --- Travis Oliphant Enthought, Inc. oliph...@enthought.com 1-512-536-1057 http://www.enthought.com ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] failure to register ufunc loops for user defined types
On 5 December 2011 17:25, Mark Wiebe mwwi...@gmail.com wrote: On Sun, Dec 4, 2011 at 11:37 PM, Geoffrey Irving irv...@naml.us wrote: snip Back to the bugs: here's a branch with all the changes I needed to get rational arithmetic to work: https://github.com/girving/numpy I discovered two more after the last email. One is another simple 0 vs. 1 bug, and another is somewhat optional: commit 730b05a892371d6f18d9317e5ae6dc306c0211b0 Author: Geoffrey Irving irv...@naml.us Date: Sun Dec 4 20:03:46 2011 -0800 After loops, check for PyErr_Occurred() even if needs_api is 0 For certain types of user defined classes, casting and ufunc loops normally run without the Python API, but occasionally need to throw an error. Currently we assume that !needs_api means no error occur. However, the fastest way to implement such loops is to run without the GIL normally and use PyGILState_Ensure/Release if an error occurs. In order to support this usage pattern, change all post-loop checks from needs_api PyErr_Occurred() to simply PyErr_Occurred() To support this properly, I think we would need to convert needs_api into an enum with this hybrid mode as another case. While it isn't done currently, I was imagining using a thread pool to multithread the trivially data-parallel operations when needs_api is false, and I suspect the PyGILState_Ensure/Release would trigger undefined behavior in a thread created entirely outside of the Python system. PyGILState_Ensure/Release can be safely used by non-python threads with the only requirement that the GIL has been initialized previously in the main thread (PyEval_InitThreads). For comparison, I created a special mechanism for simplified multi-threaded exceptions in the nditer in the 'errmsg' parameter: http://docs.scipy.org/doc/numpy/reference/c-api.iterator.html#NpyIter_GetIterNext Worth considering is also the fact that the PyGILState API is incompatible with multiple embedded interpreters. Maybe that's not something anyone does with NumPy, though. -Mark Geoffrey ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] failure to register ufunc loops for user defined types
On Mon, Dec 5, 2011 at 8:58 AM, David Cournapeau courn...@gmail.com wrote: On Sun, Dec 4, 2011 at 9:45 PM, Charles R Harris charlesr.har...@gmail.com wrote: We'll see how much interest there is. If it becomes official you may get more feedback on features. There are some advantages to having some user types in numpy. One is that otherwise they tend to get lost, another is that having a working example or two provides a templates for others to work from, and finally they provide test material. Because official user types aren't assigned anywhere there might also be some conflicts. Maybe something like an extension types module would be a way around that. In any case, I think both rational numbers and quaternions would be useful to have and I hope there is some discussion of how to do that. I agree that those will be useful, but I am worried about adding more stuff in multiarray. User-types should really be separated from multiarray. Ideally, they should be plugins but separated from multiarray would be a good first step. I think the object and datetime dtypes should also be moved out of the core multiarray module at some point. The user-type mechanism could be improved a lot based on Martin's feedback after he did the quaternion implementation, and needs further expansion to be able to support object and datetime arrays as currently implemented. I realize it is a bit unfair to have this ready for Geoffray's code changes, but depending on the timelines for the 2.0.0 milestone, I think this would be a useful thing to have. Otherwise, if some ABI/API changes are needed after 2.0, we will be dragged down with this for years. I am willing to spend time on this. Geoffray, does this sound acceptable to you ? A rational type could be added without breaking the ABI, in the same way it was done for datetime and half in 1.6. I think the revamp of the user-type mechanism needs its own NEP design document, because changing it will be a very delicate operation in dealing with how it interacts with the NumPy core, and making it much more programmer-friendly will take a fair number of design iterations. -Mark David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] astype does not work with NA object
Hi, I mistakenly filed ticket 1973 Can not display a masked array containing np.NA values even if masked (http://projects.scipy.org/numpy/ticket/1973) against masked array because that was where I found it. But the actual error is that the astype function does not handle the NA object: $ python Python 2.7 (r27:82500, Sep 16 2010, 18:02:00) [GCC 4.5.1 20100907 (Red Hat 4.5.1-3)] on linux2 Type help, copyright, credits or license for more information. import numpy as np np.__version__ '2.0.0.dev-059334c' np.array([1,2,3,4]).astype(float) array([ 1., 2., 3., 4.]) np.array([1,2,3,np.NA]).astype(float) Traceback (most recent call last): File stdin, line 1, in module ValueError: Cannot assign NA to an array which does not support NAs a=np.array([1,2,3,4], maskna=True) a[3]=np.NA a array([1, 2, 3, NA]) a.astype(float) Traceback (most recent call last): File stdin, line 1, in module ValueError: Cannot assign NA to an array which does not support NAs a*1.0 array([ 1., 2., 3., NA]) Bruce ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] failure to register ufunc loops for user defined types
On Mon, Dec 5, 2011 at 9:37 AM, mark florisson markflorisso...@gmail.comwrote: On 5 December 2011 17:25, Mark Wiebe mwwi...@gmail.com wrote: On Sun, Dec 4, 2011 at 11:37 PM, Geoffrey Irving irv...@naml.us wrote: snip Back to the bugs: here's a branch with all the changes I needed to get rational arithmetic to work: https://github.com/girving/numpy I discovered two more after the last email. One is another simple 0 vs. 1 bug, and another is somewhat optional: commit 730b05a892371d6f18d9317e5ae6dc306c0211b0 Author: Geoffrey Irving irv...@naml.us Date: Sun Dec 4 20:03:46 2011 -0800 After loops, check for PyErr_Occurred() even if needs_api is 0 For certain types of user defined classes, casting and ufunc loops normally run without the Python API, but occasionally need to throw an error. Currently we assume that !needs_api means no error occur. However, the fastest way to implement such loops is to run without the GIL normally and use PyGILState_Ensure/Release if an error occurs. In order to support this usage pattern, change all post-loop checks from needs_api PyErr_Occurred() to simply PyErr_Occurred() To support this properly, I think we would need to convert needs_api into an enum with this hybrid mode as another case. While it isn't done currently, I was imagining using a thread pool to multithread the trivially data-parallel operations when needs_api is false, and I suspect the PyGILState_Ensure/Release would trigger undefined behavior in a thread created entirely outside of the Python system. PyGILState_Ensure/Release can be safely used by non-python threads with the only requirement that the GIL has been initialized previously in the main thread (PyEval_InitThreads). Is there a way this could efficiently be used to propagate any errors back to the main thread, for example using TBB as the thread pool? The innermost task code which calls the inner loop can't call PyErr_Occurred() without first calling PyGILState_Ensure itself, which would kill utilization. Maybe this is an ABI problem in NumPy that needs to be fixed, to mandate that inner loops always return an error code and disallow them from setting the Python exception state without returning failure. -Mark For comparison, I created a special mechanism for simplified multi-threaded exceptions in the nditer in the 'errmsg' parameter: http://docs.scipy.org/doc/numpy/reference/c-api.iterator.html#NpyIter_GetIterNext Worth considering is also the fact that the PyGILState API is incompatible with multiple embedded interpreters. Maybe that's not something anyone does with NumPy, though. -Mark Geoffrey ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] failure to register ufunc loops for user defined types
On 5 December 2011 17:48, Mark Wiebe mwwi...@gmail.com wrote: On Mon, Dec 5, 2011 at 9:37 AM, mark florisson markflorisso...@gmail.com wrote: On 5 December 2011 17:25, Mark Wiebe mwwi...@gmail.com wrote: On Sun, Dec 4, 2011 at 11:37 PM, Geoffrey Irving irv...@naml.us wrote: snip Back to the bugs: here's a branch with all the changes I needed to get rational arithmetic to work: https://github.com/girving/numpy I discovered two more after the last email. One is another simple 0 vs. 1 bug, and another is somewhat optional: commit 730b05a892371d6f18d9317e5ae6dc306c0211b0 Author: Geoffrey Irving irv...@naml.us Date: Sun Dec 4 20:03:46 2011 -0800 After loops, check for PyErr_Occurred() even if needs_api is 0 For certain types of user defined classes, casting and ufunc loops normally run without the Python API, but occasionally need to throw an error. Currently we assume that !needs_api means no error occur. However, the fastest way to implement such loops is to run without the GIL normally and use PyGILState_Ensure/Release if an error occurs. In order to support this usage pattern, change all post-loop checks from needs_api PyErr_Occurred() to simply PyErr_Occurred() To support this properly, I think we would need to convert needs_api into an enum with this hybrid mode as another case. While it isn't done currently, I was imagining using a thread pool to multithread the trivially data-parallel operations when needs_api is false, and I suspect the PyGILState_Ensure/Release would trigger undefined behavior in a thread created entirely outside of the Python system. PyGILState_Ensure/Release can be safely used by non-python threads with the only requirement that the GIL has been initialized previously in the main thread (PyEval_InitThreads). Is there a way this could efficiently be used to propagate any errors back to the main thread, for example using TBB as the thread pool? The innermost task code which calls the inner loop can't call PyErr_Occurred() without first calling PyGILState_Ensure itself, which would kill utilization. No, there is no way these things can be efficient, as the GIL is likely contented anyway (I wasn't making a point for these functions, just wanted to clarify). There is in fact the additional problem that PyGILState_Ensure would initialize a threadstate, you set an exception, and when you call PyGILState_Release the threadstate gets deleted along with the exception, before you will even have a chance to check with PyErr_Occurred(). For cython.parallel I worked around this by calling PyGILState_Ensure (to initialize the thread state), followed immediately by Py_BEGIN_ALLOW_THREADS before starting any work. You then have to fetch the exception and restore it in another thread when you want to propagate it. It's a total mess, it's inefficient and if you can avoid it you should. Maybe this is an ABI problem in NumPy that needs to be fixed, to mandate that inner loops always return an error code and disallow them from setting the Python exception state without returning failure. That would likely be the best thing. -Mark For comparison, I created a special mechanism for simplified multi-threaded exceptions in the nditer in the 'errmsg' parameter: http://docs.scipy.org/doc/numpy/reference/c-api.iterator.html#NpyIter_GetIterNext Worth considering is also the fact that the PyGILState API is incompatible with multiple embedded interpreters. Maybe that's not something anyone does with NumPy, though. -Mark Geoffrey ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] failure to register ufunc loops for user defined types
On 5 December 2011 17:57, mark florisson markflorisso...@gmail.com wrote: On 5 December 2011 17:48, Mark Wiebe mwwi...@gmail.com wrote: On Mon, Dec 5, 2011 at 9:37 AM, mark florisson markflorisso...@gmail.com wrote: On 5 December 2011 17:25, Mark Wiebe mwwi...@gmail.com wrote: On Sun, Dec 4, 2011 at 11:37 PM, Geoffrey Irving irv...@naml.us wrote: snip Back to the bugs: here's a branch with all the changes I needed to get rational arithmetic to work: https://github.com/girving/numpy I discovered two more after the last email. One is another simple 0 vs. 1 bug, and another is somewhat optional: commit 730b05a892371d6f18d9317e5ae6dc306c0211b0 Author: Geoffrey Irving irv...@naml.us Date: Sun Dec 4 20:03:46 2011 -0800 After loops, check for PyErr_Occurred() even if needs_api is 0 For certain types of user defined classes, casting and ufunc loops normally run without the Python API, but occasionally need to throw an error. Currently we assume that !needs_api means no error occur. However, the fastest way to implement such loops is to run without the GIL normally and use PyGILState_Ensure/Release if an error occurs. In order to support this usage pattern, change all post-loop checks from needs_api PyErr_Occurred() to simply PyErr_Occurred() To support this properly, I think we would need to convert needs_api into an enum with this hybrid mode as another case. While it isn't done currently, I was imagining using a thread pool to multithread the trivially data-parallel operations when needs_api is false, and I suspect the PyGILState_Ensure/Release would trigger undefined behavior in a thread created entirely outside of the Python system. PyGILState_Ensure/Release can be safely used by non-python threads with the only requirement that the GIL has been initialized previously in the main thread (PyEval_InitThreads). Is there a way this could efficiently be used to propagate any errors back to the main thread, for example using TBB as the thread pool? The innermost task code which calls the inner loop can't call PyErr_Occurred() without first calling PyGILState_Ensure itself, which would kill utilization. No, there is no way these things can be efficient, as the GIL is likely contented anyway (I wasn't making a point for these functions, just wanted to clarify). There is in fact the additional problem that PyGILState_Ensure would initialize a threadstate, you set an exception, and when you call PyGILState_Release the threadstate gets deleted along with the exception, before you will even have a chance to check with PyErr_Occurred(). To clarify, this case will only happen if you're doing this from a non-Python thread that doesn't have a threadstate to begin with. For cython.parallel I worked around this by calling PyGILState_Ensure (to initialize the thread state), followed immediately by Py_BEGIN_ALLOW_THREADS before starting any work. You then have to fetch the exception and restore it in another thread when you want to propagate it. It's a total mess, it's inefficient and if you can avoid it you should. Maybe this is an ABI problem in NumPy that needs to be fixed, to mandate that inner loops always return an error code and disallow them from setting the Python exception state without returning failure. That would likely be the best thing. -Mark For comparison, I created a special mechanism for simplified multi-threaded exceptions in the nditer in the 'errmsg' parameter: http://docs.scipy.org/doc/numpy/reference/c-api.iterator.html#NpyIter_GetIterNext Worth considering is also the fact that the PyGILState API is incompatible with multiple embedded interpreters. Maybe that's not something anyone does with NumPy, though. -Mark Geoffrey ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy Governance
On Sat, Dec 3, 2011 at 6:18 PM, Travis Oliphant teoliph...@gmail.comwrote: Hi everyone, There have been some wonderfully vigorous discussions over the past few months that have made it clear that we need some clarity about how decisions will be made in the NumPy community. When we were a smaller bunch of people it seemed easier to come to an agreement and things pretty much evolved based on (mostly) consensus and who was available to actually do the work. There is a need for a more clear structure so that we know how decisions will get made and so that code can move forward while paying attention to the current user-base. There has been a steering committee structure for SciPy in the past, and I have certainly been prone to lump both NumPy and SciPy together given that I have a strong interest in and have spent a great amount of time working on both projects.Others have also spent time on both projects. However, I think it is critical at this stage to clearly separate the projects and define a governing structure that is fair and agreeable for NumPy. SciPy has multiple modules and will probably need structure around each module independently.For now, I wanted to open up a discussion to see what people thought about NumPy's governance. My initial thoughts: * discussions happen as they do now on the mailing list * a small group of developers (5-11) constitute the board and major decisions are made by vote of that group (not just simple majority --- needs at least 2/3 +1 votes). * votes are +1/+0/-0/-1 * if a topic is difficult to resolve it is moved off the main list and discussed on a separate board mailing list --- these should be rare, but parts of the NA discussion would probably qualify * This board mailing list is publically viewable but only board members may post. * The board is renewed and adjusted each year --- based on nomination and 2/3 vote of the current board until board is at 11. * The chairman of the board is voted by a majority of the board and has veto power unless over-ridden by 3/4 of the board. * Petitions to remove people off the board can be made by 50+ independent reverse nominations (hopefully people will just withdraw if they are no longer active). All of these points are open for discussion. I just thought I would start the conversation. I will be much more active this next year with NumPy and will be very interested in the direction NumPy is taking.I'm hoping to discern by this conversation, who else is very interested in the direction of NumPy so that the first board can be formally constituted. I'm definitely in support of something along these lines. My experience entering NumPy development was that the development process, coding standards, and other aspects of the process are not very well specified, and people have many differing ideas about what has already been agreed upon. I would recommend that fixing this state of affairs be placed high on the agenda of the board, with the goal of making it easier to attract new developers. A few people have proposed the BDFL approach, as in CPython development. In practice, I believe Guido has done very well in the role because he only uses the power as a last resort. Even if NumPy adopts a similar approach, having a board along the lines Travis proposes would still be a good thing, and having a BDFL would just mean that there's someone who could override the will of the board and make an entirely different choice. It may be worth considering how the governance structure is related to the different levels of the NumPy codebase. There is a (very) small group of people who have contributed significant amounts of C code, a larger group of people who have contributed significant amounts of Python code, many people who have contributed small C and/or Python patches, and a large number of people who contribute bug reports, email list comments, etc. It may be worth designing the board taking into account these different groups of developers and users. -Mark Best regards, -Travis ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] numpy 1.7.0 release?
Hi all, It's been a little over 6 months since the release of 1.6.0 and the NA debate has quieted down, so I'd like to ask your opinion on the timing of 1.7.0. It looks to me like we have a healthy amount of bug fixes and small improvements, plus three larger chucks of work: - datetime - NA - Bento support My impression is that both datetime and NA are releasable, but should be labeled tech preview or something similar, because they may still see significant changes. Please correct me if I'm wrong. There's still some maintenance work to do and pull requests to merge, but a beta release by Christmas should be feasible. What do you all think? Cheers, Ralf ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy 1.7.0 release?
I like the idea. Is there resolution to the NA question? -- Travis Oliphant (on a mobile) 512-826-7480 On Dec 5, 2011, at 2:43 PM, Ralf Gommers ralf.gomm...@googlemail.com wrote: Hi all, It's been a little over 6 months since the release of 1.6.0 and the NA debate has quieted down, so I'd like to ask your opinion on the timing of 1.7.0. It looks to me like we have a healthy amount of bug fixes and small improvements, plus three larger chucks of work: - datetime - NA - Bento support My impression is that both datetime and NA are releasable, but should be labeled tech preview or something similar, because they may still see significant changes. Please correct me if I'm wrong. There's still some maintenance work to do and pull requests to merge, but a beta release by Christmas should be feasible. What do you all think? Cheers, Ralf ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy Governance
On Mon, Dec 5, 2011 at 12:43 PM, Benjamin Root ben.r...@ou.edu wrote: On Mon, Dec 5, 2011 at 12:06 PM, Mark Wiebe mwwi...@gmail.com wrote: On Sat, Dec 3, 2011 at 6:18 PM, Travis Oliphant teoliph...@gmail.comwrote: Hi everyone, There have been some wonderfully vigorous discussions over the past few months that have made it clear that we need some clarity about how decisions will be made in the NumPy community. When we were a smaller bunch of people it seemed easier to come to an agreement and things pretty much evolved based on (mostly) consensus and who was available to actually do the work. There is a need for a more clear structure so that we know how decisions will get made and so that code can move forward while paying attention to the current user-base. There has been a steering committee structure for SciPy in the past, and I have certainly been prone to lump both NumPy and SciPy together given that I have a strong interest in and have spent a great amount of time working on both projects.Others have also spent time on both projects. However, I think it is critical at this stage to clearly separate the projects and define a governing structure that is fair and agreeable for NumPy. SciPy has multiple modules and will probably need structure around each module independently.For now, I wanted to open up a discussion to see what people thought about NumPy's governance. My initial thoughts: * discussions happen as they do now on the mailing list * a small group of developers (5-11) constitute the board and major decisions are made by vote of that group (not just simple majority --- needs at least 2/3 +1 votes). * votes are +1/+0/-0/-1 * if a topic is difficult to resolve it is moved off the main list and discussed on a separate board mailing list --- these should be rare, but parts of the NA discussion would probably qualify * This board mailing list is publically viewable but only board members may post. * The board is renewed and adjusted each year --- based on nomination and 2/3 vote of the current board until board is at 11. * The chairman of the board is voted by a majority of the board and has veto power unless over-ridden by 3/4 of the board. * Petitions to remove people off the board can be made by 50+ independent reverse nominations (hopefully people will just withdraw if they are no longer active). All of these points are open for discussion. I just thought I would start the conversation. I will be much more active this next year with NumPy and will be very interested in the direction NumPy is taking.I'm hoping to discern by this conversation, who else is very interested in the direction of NumPy so that the first board can be formally constituted. I'm definitely in support of something along these lines. My experience entering NumPy development was that the development process, coding standards, and other aspects of the process are not very well specified, and people have many differing ideas about what has already been agreed upon. I would recommend that fixing this state of affairs be placed high on the agenda of the board, with the goal of making it easier to attract new developers. A few people have proposed the BDFL approach, as in CPython development. In practice, I believe Guido has done very well in the role because he only uses the power as a last resort. Even if NumPy adopts a similar approach, having a board along the lines Travis proposes would still be a good thing, and having a BDFL would just mean that there's someone who could override the will of the board and make an entirely different choice. It may be worth considering how the governance structure is related to the different levels of the NumPy codebase. There is a (very) small group of people who have contributed significant amounts of C code, a larger group of people who have contributed significant amounts of Python code, many people who have contributed small C and/or Python patches, and a large number of people who contribute bug reports, email list comments, etc. It may be worth designing the board taking into account these different groups of developers and users. -Mark Best regards, -Travis Just some thoughts I have from this discussion. 1. I think that we need to encourage and entice more NumPy developers/contributors. Having a board of only a few core developers puts us right back in the same boat we were in during the whole NA discussion, only more codified. Increasing the size of the board with more core developers would diversify thought and counter-act group-think. I think that this problem needs to be solved before anything else. Well, that's a tough one. Numpy development tends to attract folks with spare time, i.e., students*, and those with an itch to scratch. Itched scratched, degree obtained, they go back to their primary
Re: [Numpy-discussion] numpy 1.7.0 release?
On Mon, Dec 5, 2011 at 9:13 PM, Charles R Harris charlesr.har...@gmail.comwrote: On Mon, Dec 5, 2011 at 1:08 PM, Travis Oliphant oliph...@enthought.comwrote: I like the idea. Is there resolution to the NA question? No, people still disagree and are likely to do so for years to come with no end in sight. That's why the preview label. Agreed that it's not resolved, but I think we at least got to the point where we agreed not to back out the complete missing data additions. So if we clearly say that we keep all options for future API changes open (=preview label), I don't think that the issue should hold up a numpy release indefinitely. Ralf ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] What does fftn take as parameters?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 (Note I'm a programmer type, not a math type and am doing coding directed by a matlab user.) I'm trying to do an fft on multiple columns of data at once (ultimately feeding into a correlation calculation). I can use fft() to work on one column: data=[23, 43, 53, 54, 0, 10] powtwo=8 # nearest power of two size numpy.fft.fft(data, powtwo) I want to do that but using fftn (the matlab user said it is the right function) but I can't work out from the docs or experimentation how the input data should be formatted. eg is it row major or column major. For example the above could be: data=[ [23, 43, 53, 54, 0, 10] ] or data=[ [23], [43], [53], [54], [0], [10] ] All the examples in the docs use square inputs (ie x and y axes are the same length) so that doesn't help. The documentation shows examples of the output, but not the input. I found code passing in a single int (not a list of int) as the s parameter, but that also gives me an error. Roger -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) iEYEARECAAYFAk7dN2YACgkQmOOfHg372QQ4YQCg4sKmtx8UAoEOuosWzUofw/KZ B5AAoKeHzP8HgpvDrXDANj0wqll5L9MO =iRAX -END PGP SIGNATURE- ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy 1.7.0 release
On Mon, Dec 5, 2011 at 9:13 PM, Charles R Harris charlesr.har...@gmail.comwrote: On Mon, Dec 5, 2011 at 1:08 PM, Travis Oliphant oliph...@enthought.comwrote: I like the idea. Is there resolution to the NA question? No, people still disagree and are likely to do so for years to come with no end in sight. That's why the preview label. Agreed that it's not resolved, but I think we at least got to the point where we agreed not to back out the complete missing data additions. So if we clearly say that we keep all options for future API changes open (=preview label), I don't think that the issue should hold up a numpy release indefinitely. Ralf I think a release is a good idea. In addition to the previous points mentioned, having NA in as a preview in a 1.7.0 release will likely raise it's visibility - a lot of people will read release notes of a newer version but won't ever track discussions in a mailing list. Tim Burgess Software Engineer - Coral Reef Watch Satellite Applications and Research - NESDIS National Oceanic and Atmospheric Administration ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ignore NAN in numpy.true_divide()
Maybe I am asking the wrong question or could go about this another way. I have thousands of numpy arrays to flick through, could I just identify which arrays have NAN's and for now ignore the entire array. is there a simple way to do this? any feedback will be greatly appreciated. On Thu, Dec 1, 2011 at 12:16 PM, questions anon questions.a...@gmail.comwrote: I am trying to calculate the mean across many netcdf files. I cannot use numpy.mean because there are too many files to concatenate and I end up with a memory error. I have enabled the below code to do what I need but I have a few nan values in some of my arrays. Is there a way to ignore these somewhere in my code. I seem to face this problem often so I would love a command that ignores blanks in my array before I continue on to the next processing step. Any feedback is greatly appreciated. netCDF_list=[] for dir in glob.glob(MainFolder + '*/01/')+ glob.glob(MainFolder + '*/02/')+ glob.glob(MainFolder + '*/12/'): for ncfile in glob.glob(dir + '*.nc'): netCDF_list.append(ncfile) slice_counter=0 print netCDF_list for filename in netCDF_list: ncfile=netCDF4.Dataset(filename) TSFC=ncfile.variables['T_SFC'][:] fillvalue=ncfile.variables['T_SFC']._FillValue TSFC=MA.masked_values(TSFC, fillvalue) for i in xrange(0,len(TSFC)-1,1): slice_counter +=1 #print slice_counter try: running_sum=N.add(running_sum, TSFC[i]) except NameError: print Initiating the running total of my variable... running_sum=N.array(TSFC[i]) TSFC_avg=N.true_divide(running_sum, slice_counter) N.set_printoptions(threshold='nan') print the TSFC_avg is:, TSFC_avg ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ignore NAN in numpy.true_divide()
Hi, I don't know if it is the best choice, but this is what I do in my code: for each slice: indexnonNaN=np.isfinite(SliceOf Toto) SliceOf TotoWithoutNan= SliceOf Toto [indexnonNaN] and then perform all operation I want o on the last array. i hope it does answer your question Xavier 2011/12/6 questions anon questions.a...@gmail.com Maybe I am asking the wrong question or could go about this another way. I have thousands of numpy arrays to flick through, could I just identify which arrays have NAN's and for now ignore the entire array. is there a simple way to do this? any feedback will be greatly appreciated. On Thu, Dec 1, 2011 at 12:16 PM, questions anon questions.a...@gmail.comwrote: I am trying to calculate the mean across many netcdf files. I cannot use numpy.mean because there are too many files to concatenate and I end up with a memory error. I have enabled the below code to do what I need but I have a few nan values in some of my arrays. Is there a way to ignore these somewhere in my code. I seem to face this problem often so I would love a command that ignores blanks in my array before I continue on to the next processing step. Any feedback is greatly appreciated. netCDF_list=[] for dir in glob.glob(MainFolder + '*/01/')+ glob.glob(MainFolder + '*/02/')+ glob.glob(MainFolder + '*/12/'): for ncfile in glob.glob(dir + '*.nc'): netCDF_list.append(ncfile) slice_counter=0 print netCDF_list for filename in netCDF_list: ncfile=netCDF4.Dataset(filename) TSFC=ncfile.variables['T_SFC'][:] fillvalue=ncfile.variables['T_SFC']._FillValue TSFC=MA.masked_values(TSFC, fillvalue) for i in xrange(0,len(TSFC)-1,1): slice_counter +=1 #print slice_counter try: running_sum=N.add(running_sum, TSFC[i]) except NameError: print Initiating the running total of my variable... running_sum=N.array(TSFC[i]) TSFC_avg=N.true_divide(running_sum, slice_counter) N.set_printoptions(threshold='nan') print the TSFC_avg is:, TSFC_avg ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -- « Quand le gouvernement viole les droits du peuple, l'insurrection est, pour le peuple et pour chaque portion du peuple, le plus sacré des droits et le plus indispensable des devoirs » Déclaration des droits de l'homme et du citoyen, article 35, 1793 ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy Governance
On Mon, Dec 5, 2011 at 12:10 PM, Charles R Harris charlesr.har...@gmail.com wrote: Well, that's a tough one. Numpy development tends to attract folks with spare time, i.e., students*, and those with an itch to scratch. Itched scratched, degree obtained, they go back to their primary interest or on to jobs and the rest of life. NumPy does seem to be different in this regard, in that many of the developers stick around (even if they're not active on the code any longer), think about potential issues and new directions, take part in discussions, teach at conferences, organise workshops, write, etc. I agree with Matthew that using a board should be a last resort, and mildly disagree with Perry that it would be better to have a single person make the final call. The advantage of a benevolent dictator is that you have a coherent driving vision, but at the cost of sacrificing community ownership. As for barriers to entry, improving the the nature of discourse on the mailing list (when it comes to thorny issues) would be good. Technical barriers are not that hard to breach for our community; setting the right social atmosphere is crucial. Regards Stéfan ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ignore NAN in numpy.true_divide()
Thanks for responding. I have tried several ways of adding the command, one of which is: for i in TSFC: if N.any(N.isnan(TSFC)): break else: pass but nothing is happening, is there some particular way I need to add this command? I have posted all below: netCDF_list=[] for dir in glob.glob(MainFolder + '*/01/')+ glob.glob(MainFolder + '*/02/')+ glob.glob(MainFolder + '*/12/'): #print dir for ncfile in glob.glob(dir + '*.nc'): netCDF_list.append(ncfile) slice_counter=0 print netCDF_list for filename in netCDF_list: ncfile=netCDF4.Dataset(filename) TSFC=ncfile.variables['T_SFC'][:] fillvalue=ncfile.variables['T_SFC']._FillValue TSFC=MA.masked_values(TSFC, fillvalue) for a in TSFC: if N.any(N.isnan(TSFC)): break else: pass for i in xrange(0,len(TSFC)-1,1): slice_counter +=1 #print slice_counter try: running_sum=N.add(running_sum, TSFC[i]) except NameError: print Initiating the running total of my variable... running_sum=N.array(TSFC[i]) TSFC_avg=N.true_divide(running_sum, slice_counter) N.set_printoptions(threshold='nan') print the TSFC_avg is:, TSFC_avg On Tue, Dec 6, 2011 at 9:45 AM, David Cournapeau courn...@gmail.com wrote: On Mon, Dec 5, 2011 at 5:29 PM, questions anon questions.a...@gmail.com wrote: Maybe I am asking the wrong question or could go about this another way. I have thousands of numpy arrays to flick through, could I just identify which arrays have NAN's and for now ignore the entire array. is there a simple way to do this? Doing np.any(np.isnan(a)) for an array a should answer this exact question David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ignore NAN in numpy.true_divide()
Well, I would see solutions: 1- to keep how your code is, withj a python list (you can stack numpy arrays if they have the same dimensions): for filename in netCDF_list: ncfile=netCDF4.Dataset(filename) TSFC=ncfile.variables['T_SFC'][:] fillvalue=ncfile.variables['T_SFC']._FillValue TSFC=MA.masked_values(TSFC, fillvalue) TSFCWithOutNan=[] for a in TSFC: indexnonNaN=N.isfinite(a) SliceofTotoWithoutNan=a[indexnonNaN] print SliceofTotoWithoutNan TSFCWithOutNan .append( SliceofTotoWithoutNan ) for i in xrange(0,len(TSFCWithOutNan )-1,1): slice_counter +=1 #print slice_counter try: running_sum=N.add(running_sum, TSFCWithOutNan [i]) except NameError: print Initiating the running total of my variable... running_sum=N.array(TSFCWithOutNan [i]) ... or 2- everything in the same loop: slice_counter =0 for a in TSFC: indexnonNaN=N.isfinite(a) SliceofTotoWithoutNan=a[indexnonNaN] slice_counter +=1 #print slice_counter try: running_sum=N.add(running_sum, SliceofTotoWithoutNan ) except NameError: print Initiating the running total of my variable... running_sum=N.array( SliceofTotoWithoutNan ) TSFC_avg=N.true_divide(running_sum, slice_counter) N.set_printoptions(threshold='nan') print the TSFC_avg is:, TSFC_avg See if it works. it is just a rapid guess Xavier for dir in glob.glob(MainFolder + '*/01/')+ glob.glob(MainFolder + '*/02/')+ glob.glob(MainFolder + '*/12/'): #print dir for ncfile in glob.glob(dir + '*.nc'): netCDF_list.append(ncfile) slice_counter=0 print netCDF_list for filename in netCDF_list: ncfile=netCDF4.Dataset(filename) TSFC=ncfile.variables['T_SFC'][:] fillvalue=ncfile.variables['T_SFC']._FillValue TSFC=MA.masked_values(TSFC, fillvalue) for a in TSFC: indexnonNaN=N.isfinite(a) SliceofTotoWithoutNan=a[indexnonNaN] print SliceofTotoWithoutNan TSFC=SliceofTotoWithoutNan for i in xrange(0,len(TSFC)-1,1): slice_counter +=1 #print slice_counter try: running_sum=N.add(running_sum, TSFC[i]) except NameError: print Initiating the running total of my variable... running_sum=N.array(TSFC[i]) TSFC_avg=N.true_divide(running_sum, slice_counter) N.set_printoptions(threshold='nan') print the TSFC_avg is:, TSFC_avg On Tue, Dec 6, 2011 at 9:50 AM, Xavier Barthelemy xab...@gmail.comwrote: Hi, I don't know if it is the best choice, but this is what I do in my code: for each slice: indexnonNaN=np.isfinite(SliceOf Toto) SliceOf TotoWithoutNan= SliceOf Toto [indexnonNaN] and then perform all operation I want o on the last array. i hope it does answer your question Xavier 2011/12/6 questions anon questions.a...@gmail.com Maybe I am asking the wrong question or could go about this another way. I have thousands of numpy arrays to flick through, could I just identify which arrays have NAN's and for now ignore the entire array. is there a simple way to do this? any feedback will be greatly appreciated. On Thu, Dec 1, 2011 at 12:16 PM, questions anon questions.a...@gmail.com wrote: I am trying to calculate the mean across many netcdf files. I cannot use numpy.mean because there are too many files to concatenate and I end up with a memory error. I have enabled the below code to do what I need but I have a few nan values in some of my arrays. Is there a way to ignore these somewhere in my code. I seem to face this problem often so I would love a command that ignores blanks in my array before I continue on to the next processing step. Any feedback is greatly appreciated. netCDF_list=[] for dir in glob.glob(MainFolder + '*/01/')+ glob.glob(MainFolder + '*/02/')+ glob.glob(MainFolder + '*/12/'): for ncfile in glob.glob(dir + '*.nc'): netCDF_list.append(ncfile) slice_counter=0 print netCDF_list for filename in netCDF_list: ncfile=netCDF4.Dataset(filename) TSFC=ncfile.variables['T_SFC'][:] fillvalue=ncfile.variables['T_SFC']._FillValue TSFC=MA.masked_values(TSFC, fillvalue) for i in xrange(0,len(TSFC)-1,1): slice_counter +=1 #print slice_counter try:
Re: [Numpy-discussion] ignore NAN in numpy.true_divide()
thanks again for you response. I must still be doing something wrong!! both options resulted in : the TSFC_avg is: [-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 1st option: slice_counter=0 for filename in netCDF_list: ncfile=netCDF4.Dataset(filename) TSFC=ncfile.variables['T_SFC'][:] fillvalue=ncfile.variables['T_SFC']._FillValue TSFC=MA.masked_values(TSFC, fillvalue) TSFCWithOutNan=[] for a in TSFC: indexnonNaN=N.isfinite(a) SliceofTotoWithoutNan=a[indexnonNaN] print SliceofTotoWithoutNan TSFCWithOutNan.append(SliceofTotoWithoutNan) for i in xrange(0,len(TSFCWithOutNan)-1,1): slice_counter +=1 try: running_sum=N.add(running_sum, TSFCWithOutNan[i]) except NameError: print Initiating the running total of my variable... running_sum=N.array(TSFCWithOutNan[i]) TSFC_avg=N.true_divide(running_sum, slice_counter) N.set_printoptions(threshold='nan') print the TSFC_avg is:, TSFC_avg the 2nd option : for filename in netCDF_list: ncfile=netCDF4.Dataset(filename) TSFC=ncfile.variables['T_SFC'][:] fillvalue=ncfile.variables['T_SFC']._FillValue TSFC=MA.masked_values(TSFC, fillvalue) slice_counter=0 for a in TSFC: indexnonNaN=N.isfinite(a) SliceofTotoWithoutNan=a[indexnonNaN] slice_counter +=1 try: running_sum=N.add(running_sum, SliceofTotoWithoutNan) except NameError: print Initiating the running total of my variable... running_sum=N.array(SliceofTotoWithoutNan) TSFC_avg=N.true_divide(running_sum, slice_counter) N.set_printoptions(threshold='nan') print the TSFC_avg is:, TSFC_avg On Tue, Dec 6, 2011 at 2:31 PM, Xavier Barthelemy xab...@gmail.com wrote: Well, I would see solutions: 1- to keep how your code is, withj a python list (you can stack numpy arrays if they have the same dimensions): for filename in netCDF_list: ncfile=netCDF4.Dataset(filename) TSFC=ncfile.variables['T_SFC'][:] fillvalue=ncfile.variables['T_SFC']._FillValue TSFC=MA.masked_values(TSFC, fillvalue) TSFCWithOutNan=[] for a in TSFC: indexnonNaN=N.isfinite(a) SliceofTotoWithoutNan=a[indexnonNaN] print SliceofTotoWithoutNan TSFCWithOutNan .append( SliceofTotoWithoutNan ) for i in xrange(0,len(TSFCWithOutNan )-1,1): slice_counter +=1 #print slice_counter try: running_sum=N.add(running_sum, TSFCWithOutNan [i]) except NameError: print Initiating the running total of my variable... running_sum=N.array(TSFCWithOutNan [i]) ... or 2- everything in the same loop: slice_counter =0 for a in TSFC: indexnonNaN=N.isfinite(a) SliceofTotoWithoutNan=a[indexnonNaN] slice_counter +=1 #print slice_counter try: running_sum=N.add(running_sum, SliceofTotoWithoutNan ) except NameError: print Initiating the running total of my variable... running_sum=N.array( SliceofTotoWithoutNan ) TSFC_avg=N.true_divide(running_sum, slice_counter) N.set_printoptions(threshold='nan') print the TSFC_avg is:, TSFC_avg See if it works. it is just a rapid guess Xavier for dir in glob.glob(MainFolder + '*/01/')+ glob.glob(MainFolder + '*/02/')+ glob.glob(MainFolder + '*/12/'): #print dir for ncfile in glob.glob(dir + '*.nc'): netCDF_list.append(ncfile) slice_counter=0 print netCDF_list for filename in netCDF_list: ncfile=netCDF4.Dataset(filename) TSFC=ncfile.variables['T_SFC'][:] fillvalue=ncfile.variables['T_SFC']._FillValue TSFC=MA.masked_values(TSFC, fillvalue) for a in TSFC: indexnonNaN=N.isfinite(a) SliceofTotoWithoutNan=a[indexnonNaN] print SliceofTotoWithoutNan TSFC=SliceofTotoWithoutNan for i in xrange(0,len(TSFC)-1,1):
Re: [Numpy-discussion] Apparently non-deterministic behaviour of complex array multiplication
Hi Nathaniel, Thanks for the suggestion. I more or less implemented it: np.save('X',X); X2=np.load('X.npy') X2=np.asmatrix(X2) diffy = (X != X2) if diffy.any(): print X[diffy] print X2[diffy] print X[diffy][0].view(np.uint8) print X2[diffy][0].view(np.uint8) S=X*X.H/k S2=X2*X2.H/k nanElts=find(isnan(S)) if len(nanElts)!=0: print 'WARNING: Nans in S:'+str(find(isnan(S))) print 'WARNING: Nans in S2:'+str(find(isnan(S2))) My ouput, (when I got NaN) mostly indicated that both arrays are numerically identical, and that they evaluated to have the same nan-value entries. For example WARNING: Nans in S:[ 6 16] WARNING: Nans in S2:[ 6 16] Another time I got as output: WARNING: Nans in S:[ 26 36 46 54 64 72 82 92 100 110 128 138 146 156 166 174 184 192 202 212 220 230 240 250 260 268 278 279 296 297 306 314 324 334 335 342 352 360 370 380 388 398 416 426 434 444 454 464 474] WARNING: Nans in S2:[ 26 36 46 54 64 72 82 92 100 110 128 138 146 156 166 174 184 192 202 212 220 230 240 250 260 268 278 279 296 297 306 314 324 334 335 342 352 360 370 380 388 398 416 426 434 444 454 464 474] These were different arrays I think. At anyrate, those two results appeared from two runs of the exact same code. I do not use any random numbers in the code by the way. Most of the time the code runs without any nan showing up at all, so this is an improvement. *I am pretty sure that one time there were nan in S, but not in S2, yet still no difference was observed in the two matrices X and X2. But, I did not save that output, so I can't prove it to myself, ... but I am pretty sure I saw that. I will try and run memtest tonight. I am going out of town for a week and probably wont be able to test until next week. cheers, Karl I also think What was beyond w: 1. I have many less NaN than I used to, but still get NaN in S, but NOT in S2! Nathaniel Smith wrote: If save/load actually makes a reliable difference, then it would be useful to do something like this, and see what you see: save(X, X) X2 = load(X.npy) diff = (X == X2) # did save/load change anything? any(diff) # if so, then what changed? X[diff] X2[diff] # any subtle differences in floating point representation? X[diff][0].view(np.uint8) X2[diff][0].view(np.uint8) (You should still run memtest. It's very easy - just install it with your package manager, then reboot. Hold down the shift key while booting, and you'll get a boot menu. Choose memtest, and then leave it to run overnight.) - Nathaniel On Dec 2, 2011 10:10 PM, kneil magnetotellur...@gmail.com wrote: -- View this message in context: http://old.nabble.com/Apparently-non-deterministic-behaviour-of-complex-array-multiplication-tp32893004p32922174.html Sent from the Numpy-discussion mailing list archive at Nabble.com. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy Governance
Hi, 2011/12/5 Stéfan van der Walt ste...@sun.ac.za: As for barriers to entry, improving the the nature of discourse on the mailing list (when it comes to thorny issues) would be good. Technical barriers are not that hard to breach for our community; setting the right social atmosphere is crucial. I'm just about to get on a plane and am going to be out of internet range for a while, so, in the spirit of constructive discussion: In the spirit of use-cases: Would it be fair to say that the two contentious recent discussions have been: The numpy ABI breakage, 2.0 vs 1.5.1 discussion The masked array discussion(s) ? What did we do wrong or right in each of these two discussions? What could we have done better? What process would help us to do better? Travis - for your board-only-post mailing list - my feeling is that this is going in the wrong direction. The effect of the board-only mailing list is to explicitly remove non-qualified people from the discussion. This will make it more explicit that the substantial decisions will be make by a few important people. Do you (Travis - or Mark?) think that, if this had happened earlier in the masked array discussion, it would have been less contentious, or had more substantial content? My instinct would be the reverse, and the best solution would have been to pause and commit to beating out the issues and getting agreement. See you, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] idea of optimisation?
Hi everyone I was wondering if there is a more optimal way to write what follows: I am studying waves, so I have an array of wave crests positions, Xcrest and the positions of the ZeroCrossings, Xzeros. The goal is to find between which Xzeros my xcrest are. XXX1=XCrest CrestZerosNeighbour=np.zeros([len(XCrest),2], dtype='d') for nn in range(len(Xzeros)-1): X1=Xzeros[nn] X2=Xzeros[nn+1] indexxx1=np.where((X1=XXX1) (XXX1 X2)) try: CrestZerosNeighbour[indexxx1[0]]=np.array([X1,X2]) except: pass Someone has an idea? in the spirit of (numpy.ma.masked_outside) which does exactly the opposite I want: it masks an array outside an interval. I would like to mask everything except the interval that contains my value. I do this operation a large number of times , and a loop is time consuming. thanks Xavier -- « Quand le gouvernement viole les droits du peuple, l'insurrection est, pour le peuple et pour chaque portion du peuple, le plus sacré des droits et le plus indispensable des devoirs » Déclaration des droits de l'homme et du citoyen, article 35, 1793 ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] idea of optimisation?
Excerpts from Xavier Barthelemy's message of mar. déc. 06 06:53:09 +0100 2011: Hi everyone I was wondering if there is a more optimal way to write what follows: I am studying waves, so I have an array of wave crests positions, Xcrest and the positions of the ZeroCrossings, Xzeros. The goal is to find between which Xzeros my xcrest are. XXX1=XCrest CrestZerosNeighbour=np.zeros([len(XCrest),2], dtype='d') for nn in range(len(Xzeros)-1): X1=Xzeros[nn] X2=Xzeros[nn+1] indexxx1=np.where((X1=XXX1) (XXX1 X2)) try: CrestZerosNeighbour[indexxx1[0]]=np.array([X1,X2]) except: pass Someone has an idea? in the spirit of (numpy.ma.masked_outside) which does exactly the opposite I want: it masks an array outside an interval. I would like to mask everything except the interval that contains my value. I do this operation a large number of times , and a loop is time consuming. Hi, My first idea would be to write a function in C or Fortran that return Xzeros index (instead of values). Algorithms may be optimized according to the inputs: if the XCrest are Xzeros sorted, if len(Xcreast)len(Xzeros) using dichotomy... But I would be interested to see a solution with masked array too. -- ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] What does fftn take as parameters?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 05/12/11 14:19, David Cournapeau wrote: I am not I understand what you are trying to do ? I had a slight misunderstanding with the math guy and had believed that for our purposes we could feed in 16 columns and get one column of fft output. However we do actually need 16 columns of output, each corresponding to a column of input. It seems he is obsessed with optimisation and apparently when calculating an fft of a known size it would save some redundant calculations operating on all 16 columns at once rather than doing them one at a time. That is what he assumed fftn did from the description. numpy.fft.fft will compute fft on every *row*, or every column if you say pass axis=0 argument: Note that I am using regular Python lists (they were JSON at one point) and the fft documentation is incomprehensible to someone who hasn't used numpy before and only cares about fft (there are a lot of matches for Google searches about fft and python pointing to numpy). The doc doesn't actually say what axis is and doesn't have an example. Additionally a shape attribute is used which is peculiar to whatever numpy uses as its data representation. Roger -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) iEYEARECAAYFAk7dyRwACgkQmOOfHg372QToxgCfR7IoUfgGQVZEEiElnjbtx7yx R8EAnRfDg4y7AfFeSA8sQxVCq6ucgRG1 =gg2h -END PGP SIGNATURE- ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] idea of optimisation?
ok let me be more precise I have an Z array which is the elevation from this I extract a discrete array of Zero Crossing, and another discrete array of Crests. len(crest) is different than len(Xzeros). I have a threshold method to detect my valid crests, and sometimes there are 2 crests between two zero-crossing (grouping effect) Crest and Zeros are 2 different arrays, with positions. example: Zeros=[1,2,3,4] Arrays=[1.5,1.7,3.5] and yes arrays can be sorted. not a problm with this. Xavier 2011/12/6 David Froger david.fro...@gmail.com Excerpts from Xavier Barthelemy's message of mar. déc. 06 06:53:09 +0100 2011: Hi everyone I was wondering if there is a more optimal way to write what follows: I am studying waves, so I have an array of wave crests positions, Xcrest and the positions of the ZeroCrossings, Xzeros. The goal is to find between which Xzeros my xcrest are. XXX1=XCrest CrestZerosNeighbour=np.zeros([len(XCrest),2], dtype='d') for nn in range(len(Xzeros)-1): X1=Xzeros[nn] X2=Xzeros[nn+1] indexxx1=np.where((X1=XXX1) (XXX1 X2)) try: CrestZerosNeighbour[indexxx1[0]]=np.array([X1,X2]) except: pass Someone has an idea? in the spirit of (numpy.ma.masked_outside) which does exactly the opposite I want: it masks an array outside an interval. I would like to mask everything except the interval that contains my value. I do this operation a large number of times , and a loop is time consuming. Hi, My first idea would be to write a function in C or Fortran that return Xzeros index (instead of values). Algorithms may be optimized according to the inputs: if the XCrest are Xzeros sorted, if len(Xcreast)len(Xzeros) using dichotomy... But I would be interested to see a solution with masked array too. -- ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -- « Quand le gouvernement viole les droits du peuple, l'insurrection est, pour le peuple et pour chaque portion du peuple, le plus sacré des droits et le plus indispensable des devoirs » Déclaration des droits de l'homme et du citoyen, article 35, 1793 ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion