A few thoughts:
1) yes, a faster, more memory efficient text file parser would be great.
Yeah, if your workflow relies on parsing lots of huge text files, you
probably need another workflow. But it's a really really common thing to
nee to do -- why not do it fast?
2) you are describing a special
On 28 Oct 2014 20:10, Chris Barker chris.bar...@noaa.gov wrote:
Memory efficiency -- somethign like my growable array is not all that
hard to implement and pretty darn quick -- you just do the usual trick_
over allocate a bit of memory, and when it gets full re-allocate a larger
chunk.
Can't
As a bit of an aside, I have just discovered that for fixed-width text
data, numpy's text readers seems to edge out pandas' read_fwf(), and numpy
has the advantage of being able to specify the dtypes ahead of time (seems
that the pandas version just won't allow it, which means I end up with
On 28.10.2014 21:24, Nathaniel Smith wrote:
On 28 Oct 2014 20:10, Chris Barker chris.bar...@noaa.gov
mailto:chris.bar...@noaa.gov wrote:
Memory efficiency -- somethign like my growable array is not all that
hard to implement and pretty darn quick -- you just do the usual trick_
over allocate
On Tue, Oct 28, 2014 at 1:24 PM, Nathaniel Smith n...@pobox.com wrote:
Memory efficiency -- somethign like my growable array is not all that
hard to implement and pretty darn quick -- you just do the usual trick_
over allocate a bit of memory, and when it gets full re-allocate a larger
you should have a read here/
http://wesmckinney.com/blog/?p=543
going below the 2x memory usage on read in is non trivial and costly in terms
of performance
On Oct 26, 2014, at 4:46 AM, Saullo Castro saullogiov...@gmail.com wrote:
I would like to start working on a memory efficient
Im not sure why the memory doubling is necessary. Isnt it possible to
preallocate the arrays and write to them? I suppose this might be
inefficient though, in case you end up reading only a small subset of rows
out of a mostly corrupt file? But that seems to be a rather uncommon corner
case.
On Sun, Oct 26, 2014 at 1:21 PM, Eelco Hoogendoorn
hoogendoorn.ee...@gmail.com wrote:
Im not sure why the memory doubling is necessary. Isnt it possible to
preallocate the arrays and write to them?
Not without reading the whole file first to know how many rows to preallocate.
--
Robert Kern
On 26 October 2014 12:54, Jeff Reback jeffreb...@gmail.com wrote:
you should have a read here/
http://wesmckinney.com/blog/?p=543
going below the 2x memory usage on read in is non trivial and costly in
terms of performance
If you know in advance the number of rows (because it is in the
On 26 Oct 2014, at 02:21 pm, Eelco Hoogendoorn hoogendoorn.ee...@gmail.com
wrote:
Im not sure why the memory doubling is necessary. Isnt it possible to
preallocate the arrays and write to them? I suppose this might be inefficient
though, in case you end up reading only a small subset of
you are describing a special case where you know the data size apriori (eg not
streaming), dtypes are readily apparent from a small sample case
and in general your data is not messy
I would agree if these can be satisfied then you can achieve closer to a 1x
memory overhead
using bcolZ is
On 26 Oct 2014 11:54, Jeff Reback jeffreb...@gmail.com wrote:
you should have a read here/
http://wesmckinney.com/blog/?p=543
going below the 2x memory usage on read in is non trivial and costly in
terms of performance
On Linux you can probably go below 2x overhead easily, by exploiting the
On 26/10/14 09:46, Saullo Castro wrote:
I would like to start working on a memory efficient alternative for
np.loadtxt and np.genfromtxt that uses arrays instead of lists to store
the data while the file iterator is exhausted.
...
I would be glad if you could share your experience on this
efficient alternative for np.loadtxt and
np.genfromtxt (Daniele Nicolodi)
--
Message: 1
Date: Sun, 26 Oct 2014 17:42:32 +0100
From: Daniele Nicolodi dani...@grinta.net
Subject: Re: [Numpy-discussion] Memory efficient
At 06:32 AM 10/26/2014, you wrote:
On Sun, Oct 26, 2014 at 1:21 PM, Eelco Hoogendoorn
hoogendoorn.ee...@gmail.com wrote:
Im not sure why the memory doubling is necessary. Isnt it possible to
preallocate the arrays and write to them?
Not without reading the whole file first to know how many
15 matches
Mail list logo