On 11/15/2011 06:02 PM, Warren Weckesser wrote:


On Tue, Nov 15, 2011 at 10:48 AM, Andreas Müller <amuel...@ais.uni-bonn.de <mailto:amuel...@ais.uni-bonn.de>> wrote:

    On 11/15/2011 05:46 PM, Andreas Müller wrote:
    On 11/15/2011 04:28 PM, Bruce Southey wrote:
    On 11/14/2011 10:05 AM, Andreas Müller wrote:
    On 11/14/2011 04:23 PM, David Cournapeau wrote:
    On Mon, Nov 14, 2011 at 12:46 PM, Andreas Müller
    <amuel...@ais.uni-bonn.de>  <mailto:amuel...@ais.uni-bonn.de>   wrote:
    Hi everybody.
    When I did some normalization using numpy, I noticed that numpy.std uses
    more ram than I was expecting.
    A quick google search gave me this:
    http://luispedro.org/software/ncreduce
    The site claims that std and other reduce operations are implemented
    naively with many temporaries.
    Is that true? And if so, is there a particular reason for that?
    This issues seems quite easy to fix.
    In particular the link I gave above provides code.
    The code provided only implements a few special cases: being more
    efficient in those cases only is indeed easy.
    I am particularly interested in the std function.
    Is this implemented as a separate function or an instantiation
    of a general reduce operations?

    _______________________________________________
    NumPy-Discussion mailing list
    NumPy-Discussion@scipy.org  <mailto:NumPy-Discussion@scipy.org>
    http://mail.scipy.org/mailman/listinfo/numpy-discussion
    The'On-line algorithm'
    
(http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#On-line_algorithm)
    
<http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#On-line_algorithm>
    could save you storage. I would presume if you know cython that
    you can probably make it quick as well (to address the loop over
    the data).


    My question was more along the lines of "why doesn't numpy do the
    online algorithm".

    To be more precise, even not using the online version but
    computing E(X^2) and E(X)^2 would be good.
    It seems numpy centers the whole dataset. Otherwise I can't
    explain why the memory needed should depend
    on the number of examples.



Yes, that is what it is doing. See line 63 in the function _var(), which is called by _std():
https://github.com/numpy/numpy/blob/master/numpy/core/_methods.py


Thanks for the clarification. I thought the function was somewhere in the C code -
don't know why.
I'll see if I can reformulate the function.



_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to