[Numpy-discussion] Change in memmap behaviour

2012-07-02 Thread Sveinung Gundersen
Hi, We are developing a large project for genome analysis (http://hyperbrowser.uio.no), where we use memmap vectors as the basic data structure for storage. The stored data are accessed in slices, and used as basis for calculations. As the stored data may be large (up to 24 GB), the memory

Re: [Numpy-discussion] Change in memmap behaviour

2012-07-02 Thread Nathaniel Smith
On Mon, Jul 2, 2012 at 3:53 PM, Sveinung Gundersen svein...@gmail.com wrote: Hi, We are developing a large project for genome analysis (http://hyperbrowser.uio.no), where we use memmap vectors as the basic data structure for storage. The stored data are accessed in slices, and used as basis

Re: [Numpy-discussion] Change in memmap behaviour

2012-07-02 Thread Sveinung Gundersen
[snip] Your actual memory usage may not have increased as much as you think, since memmap objects don't necessarily take much memory -- it sounds like you're leaking virtual memory, but your resident set size shouldn't go up as much. As I understand it, memmap objects retain the contents

[Numpy-discussion] Numpy regression in 1.6.2 in deducing the dtype for record array

2012-07-02 Thread Sandro Tosi
Hello, I'd like to point you to this bug report just reported to Debian: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=679948 It would be really awesome if you could give a look and comment if the proposed fix would be appropriate. Thanks a lot, -- Sandro Tosi (aka morph, morpheus,

[Numpy-discussion] import numpy performance

2012-07-02 Thread Andrew Dalke
In this email I propose a few changes which I think are minor and which don't really affect the external NumPy API but which I think could improve the import numpy performance by at least 40%. This affects me because I and my clients use a chemistry toolkit which uses only NumPy arrays, and where

[Numpy-discussion] Fwd: an interesting single-file, cross-platform Python deployment tool.

2012-07-02 Thread Fernando Perez
Hi all, sorry for the slightly off-topic post, but I know that in our community many people often struggle with deployment issues (to colleagues, to experimental/hardware control machines, to one-off test machines, ...). I just stumbled upon this announcement by accident, and figured it might

Re: [Numpy-discussion] Fwd: an interesting single-file, cross-platform Python deployment tool.

2012-07-02 Thread klo uo
On Mon, Jul 2, 2012 at 9:26 PM, Fernando Perez wrote: ANNOUNCING eGenix PyRun - One file Python Runtime Version 1.0.0 An easy-to-use single file relocatable Python run-time - available for Windows, Mac OS X and Unix

[Numpy-discussion] Multivariate hypergeometric distribution?

2012-07-02 Thread Fernando Perez
Hi all, in recent work with a colleague, the need came up for a multivariate hypergeometric sampler; I had a look in the numpy code and saw we have the bivariate version, but not the multivariate one. I had a look at the code in scipy.stats.distributions, and it doesn't look too difficult to add

Re: [Numpy-discussion] import numpy performance

2012-07-02 Thread David Cournapeau
On Mon, Jul 2, 2012 at 8:17 PM, Andrew Dalke da...@dalkescientific.com wrote: In this email I propose a few changes which I think are minor and which don't really affect the external NumPy API but which I think could improve the import numpy performance by at least 40%. This affects me because

Re: [Numpy-discussion] import numpy performance

2012-07-02 Thread Nathaniel Smith
On Mon, Jul 2, 2012 at 8:17 PM, Andrew Dalke da...@dalkescientific.com wrote: In this email I propose a few changes which I think are minor and which don't really affect the external NumPy API but which I think could improve the import numpy performance by at least 40%. This affects me because

Re: [Numpy-discussion] Change in memmap behaviour

2012-07-02 Thread Nathaniel Smith
On Mon, Jul 2, 2012 at 6:54 PM, Sveinung Gundersen svein...@gmail.com wrote: [snip] Your actual memory usage may not have increased as much as you think, since memmap objects don't necessarily take much memory -- it sounds like you're leaking virtual memory, but your resident set size

Re: [Numpy-discussion] import numpy performance

2012-07-02 Thread Benjamin Root
On Mon, Jul 2, 2012 at 4:34 PM, Nathaniel Smith n...@pobox.com wrote: On Mon, Jul 2, 2012 at 8:17 PM, Andrew Dalke da...@dalkescientific.com wrote: In this email I propose a few changes which I think are minor and which don't really affect the external NumPy API but which I think could

Re: [Numpy-discussion] import numpy performance

2012-07-02 Thread Robert Kern
On Mon, Jul 2, 2012 at 9:43 PM, Benjamin Root ben.r...@ou.edu wrote: On Mon, Jul 2, 2012 at 4:34 PM, Nathaniel Smith n...@pobox.com wrote: I think this ship has sailed, but it'd be worth looking into lazy importing, where 'numpy.fft' isn't actually imported until someone starts using it.

Re: [Numpy-discussion] import numpy performance

2012-07-02 Thread Pauli Virtanen
02.07.2012 21:17, Andrew Dalke kirjoitti: [clip] 1) remove add_newdocs and put the docstrings in the C code 'add_newdocs' still needs to be there, The docstrings need to be in an easily parseable format, because of the online documentation editor. Keeping the current format may be the easiest

Re: [Numpy-discussion] import numpy performance

2012-07-02 Thread Andrew Dalke
On Jul 2, 2012, at 10:33 PM, David Cournapeau wrote: On Mon, Jul 2, 2012 at 8:17 PM, Andrew Dalke da...@dalkescientific.com wrote: In July of 2008 I started a thread about how import numpy was noticeably slow for one of my customers. ... I managed to get the import time down from 0.21

Re: [Numpy-discussion] Fwd: an interesting single-file, cross-platform Python deployment tool.

2012-07-02 Thread Andrea Gavana
On 2 July 2012 22:11, klo uo wrote: On Mon, Jul 2, 2012 at 9:26 PM, Fernando Perez wrote: ANNOUNCING                  eGenix PyRun - One file Python Runtime                              Version 1.0.0           An easy-to-use single file relocatable Python run-time -             available

Re: [Numpy-discussion] import numpy performance

2012-07-02 Thread Fernando Perez
On Mon, Jul 2, 2012 at 2:26 PM, Andrew Dalke da...@dalkescientific.com wrote: so the relevant timing test is more likely: % time python -c 'import numpy.core.multiarray' 0.086u 0.031s 0:00.12 91.6% 0+0k 0+0io 0pf+0w No, that's the wrong thing to test, because it effectively amounts to

Re: [Numpy-discussion] import numpy performance

2012-07-02 Thread Andrew Dalke
On Jul 2, 2012, at 10:34 PM, Nathaniel Smith wrote: I don't have any opinion on how acceptable this would be, but I also don't see a benchmark showing how much this would help? The profile output was lower in that email. The relevant line is 0.038 add_newdocs (numpy.core.multiarray) This says

Re: [Numpy-discussion] Change in memmap behaviour

2012-07-02 Thread Sveinung Gundersen
On 2. juli 2012, at 22.40, Nathaniel Smith wrote: On Mon, Jul 2, 2012 at 6:54 PM, Sveinung Gundersen svein...@gmail.com wrote: [snip] Your actual memory usage may not have increased as much as you think, since memmap objects don't necessarily take much memory -- it sounds like you're

Re: [Numpy-discussion] import numpy performance

2012-07-02 Thread Nathaniel Smith
On Mon, Jul 2, 2012 at 10:06 PM, Robert Kern robert.k...@gmail.com wrote: On Mon, Jul 2, 2012 at 9:43 PM, Benjamin Root ben.r...@ou.edu wrote: On Mon, Jul 2, 2012 at 4:34 PM, Nathaniel Smith n...@pobox.com wrote: I think this ship has sailed, but it'd be worth looking into lazy importing,

Re: [Numpy-discussion] import numpy performance

2012-07-02 Thread Andrew Dalke
On Jul 2, 2012, at 11:38 PM, Fernando Perez wrote: No, that's the wrong thing to test, because it effectively amounts to 'import numpy', sicne the numpy __init__ file is still executed. As David indicated, you must import multarray.so by itself. I understand that clarification. However, it

Re: [Numpy-discussion] import numpy performance

2012-07-02 Thread Nathaniel Smith
On Mon, Jul 2, 2012 at 10:44 PM, Andrew Dalke da...@dalkescientific.com wrote: On Jul 2, 2012, at 10:34 PM, Nathaniel Smith wrote: I don't have any opinion on how acceptable this would be, but I also don't see a benchmark showing how much this would help? The profile output was lower in that

Re: [Numpy-discussion] import numpy performance

2012-07-02 Thread Fernando Perez
On Mon, Jul 2, 2012 at 3:15 PM, Andrew Dalke da...@dalkescientific.com wrote: Thus, I don't see any way that I can import 'multiarray' directly, because the underlying C code is the one which imports 'numpy.core.multiarray' and by design it is inaccessible to change from Python code. I was

Re: [Numpy-discussion] import numpy performance

2012-07-02 Thread Nathaniel Smith
On Mon, Jul 2, 2012 at 11:15 PM, Andrew Dalke da...@dalkescientific.com wrote: On Jul 2, 2012, at 11:38 PM, Fernando Perez wrote: No, that's the wrong thing to test, because it effectively amounts to 'import numpy', sicne the numpy __init__ file is still executed. As David indicated, you must

Re: [Numpy-discussion] Combined versus separate build

2012-07-02 Thread Nathaniel Smith
On Sun, Jul 1, 2012 at 9:17 PM, David Cournapeau courn...@gmail.com wrote: On Sun, Jul 1, 2012 at 8:32 PM, Nathaniel Smith n...@pobox.com wrote: On Sun, Jul 1, 2012 at 7:36 PM, David Cournapeau courn...@gmail.com wrote: On Sun, Jul 1, 2012 at 6:36 PM, Nathaniel Smith n...@pobox.com wrote: On

Re: [Numpy-discussion] import numpy performance

2012-07-02 Thread David Cournapeau
On Mon, Jul 2, 2012 at 11:15 PM, Andrew Dalke da...@dalkescientific.com wrote: On Jul 2, 2012, at 11:38 PM, Fernando Perez wrote: No, that's the wrong thing to test, because it effectively amounts to 'import numpy', sicne the numpy __init__ file is still executed. As David indicated, you must

Re: [Numpy-discussion] import numpy performance

2012-07-02 Thread Andrew Dalke
On Jul 3, 2012, at 12:21 AM, Nathaniel Smith wrote: Yes, but for a proper benchmark we need to compare this to the number that we would get with some other implementation... I'm assuming you aren't proposing we just delete the docstrings :-). I suspect that we have a different meaning of the

Re: [Numpy-discussion] import numpy performance

2012-07-02 Thread Andrew Dalke
On Jul 3, 2012, at 12:46 AM, David Cournapeau wrote: It is indeed irrelevant to your end goal, but it does affect the interpretation of what import_array does, and thus of your benchmark Indeed. Focusing on polynomial seems the only sensible action. Except for test, all the other stuff seem

Re: [Numpy-discussion] Combined versus separate build

2012-07-02 Thread David Cournapeau
On Mon, Jul 2, 2012 at 11:34 PM, Nathaniel Smith n...@pobox.com wrote: To be clear, this subthread started with the caveat *as far as our officially supported platforms go* -- I'm not saying that we should go around and remove all the NPY_NO_EXPORT macros tomorrow. However, the only reason

Re: [Numpy-discussion] Would a patch with a function for incrementing an array with advanced indexing be accepted?

2012-07-02 Thread John Salvatier
Hi Fred, That's an excellent idea, but I am not too familiar with this use case. What do you mean by list in 'matrix[list]'? Is the use case, just incrementing in place a sub matrix of a numpy matrix? John On Fri, Jun 29, 2012 at 11:43 AM, Frédéric Bastien no...@nouiz.org wrote: Hi, I

[Numpy-discussion] Buildbot status

2012-07-02 Thread Stéfan van der Walt
Hi all, I'd like to find out what the current status of continuous integration is for numpy. I'm aware of: a) http://buildbot.scipy.org -- used by Ralf for testing releases? b) http://travis-ci.org -- connected via GitHub c) http://184.73.247.160:8111 -- dedicated Amazon EC2 with TeamCity d)

Re: [Numpy-discussion] Multivariate hypergeometric distribution?

2012-07-02 Thread josef . pktd
On Mon, Jul 2, 2012 at 4:16 PM, Fernando Perez fperez@gmail.com wrote: Hi all, in recent work with a colleague, the need came up for a multivariate hypergeometric sampler; I had a look in the numpy code and saw we have the bivariate version, but not the multivariate one. I had a look at

Re: [Numpy-discussion] Buildbot status

2012-07-02 Thread Fernando Perez
Useful-looking: http://gcc.gnu.org/wiki/CompileFarm ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Buildbot status

2012-07-02 Thread Ondřej Čertík
Hi Stefan, On Mon, Jul 2, 2012 at 5:07 PM, Stéfan van der Walt ste...@sun.ac.za wrote: Hi all, I'd like to find out what the current status of continuous integration is for numpy. I'm aware of: a) http://buildbot.scipy.org -- used by Ralf for testing releases? b) http://travis-ci.org --

[Numpy-discussion] f2py with allocatable arrays

2012-07-02 Thread Casey W. Stark
Hi numpy. Does anyone know if f2py supports allocatable arrays, allocated inside fortran subroutines? The old f2py docs seem to indicate that the allocatable array must be created with numpy, and dropped in the module. Here's more background to explain... I have a fortran subroutine that returns

Re: [Numpy-discussion] Buildbot status

2012-07-02 Thread Stéfan van der Walt
On Mon, Jul 2, 2012 at 5:31 PM, Ondřej Čertík ondrej.cer...@gmail.com wrote: Yes, definitely. I will have time to work on the tests in about 2 weeks. Could you coordinate with Travis? He can make it official. I'd gladly coordinate with everyone, but I'd like to do it here on the mailing list so

Re: [Numpy-discussion] Buildbot status

2012-07-02 Thread Travis Oliphant
Ondrej should have time to work on this full time in the coming days. I think your list, Stefan, is as complete a list as we have. A few interns have investigated Team City and other CI systems and a combination of Jenkins and Travis CI has been suggested. NumFocus can provide some funding

Re: [Numpy-discussion] Multivariate hypergeometric distribution?

2012-07-02 Thread josef . pktd
On Mon, Jul 2, 2012 at 8:08 PM, josef.p...@gmail.com wrote: On Mon, Jul 2, 2012 at 4:16 PM, Fernando Perez fperez@gmail.com wrote: Hi all, in recent work with a colleague, the need came up for a multivariate hypergeometric sampler; I had a look in the numpy code and saw we have the

Re: [Numpy-discussion] Multivariate hypergeometric distribution?

2012-07-02 Thread Skipper Seabold
On Mon, Jul 2, 2012 at 9:35 PM, josef.p...@gmail.com wrote: On Mon, Jul 2, 2012 at 8:08 PM, josef.p...@gmail.com wrote: On Mon, Jul 2, 2012 at 4:16 PM, Fernando Perez fperez@gmail.com wrote: Hi all, in recent work with a colleague, the need came up for a multivariate

Re: [Numpy-discussion] Multivariate hypergeometric distribution?

2012-07-02 Thread Fernando Perez
On Mon, Jul 2, 2012 at 7:31 PM, Skipper Seabold jsseab...@gmail.com wrote: I could be wrong, but I think PyMC has sampling and likelihood. It appears you're right! http://pymc-devs.github.com/pymc/distributions.html?highlight=hypergeometric#pymc.distributions.multivariate_hypergeometric_like

Re: [Numpy-discussion] Multivariate hypergeometric distribution?

2012-07-02 Thread Fernando Perez
On Mon, Jul 2, 2012 at 7:49 PM, Fernando Perez fperez@gmail.com wrote: It appears you're right! http://pymc-devs.github.com/pymc/distributions.html?highlight=hypergeometric#pymc.distributions.multivariate_hypergeometric_like Furthermore, the code actually calls a sampler implemented in

Re: [Numpy-discussion] Multivariate hypergeometric distribution?

2012-07-02 Thread josef . pktd
On Mon, Jul 2, 2012 at 10:53 PM, Fernando Perez fperez@gmail.com wrote: On Mon, Jul 2, 2012 at 7:49 PM, Fernando Perez fperez@gmail.com wrote: It appears you're right! nice idea: https://github.com/pymc-devs/pymc/blob/master/pymc/distributions.py#L1670