Re: [Numpy-discussion] memory-efficient loadtxt

Paul Anton Letnes Wed, 03 Oct 2012 08:58:54 -0700

On 3. okt. 2012, at 17:48, Wes McKinney wrote:

> On Monday, October 1, 2012, Chris Barker wrote:
> Paul,
> 
> Nice to see someone working on these issues, but:
> 
> I'm not sure the problem you are trying to solve -- accumulating in a
> list is pretty efficient anyway -- not a whole lot overhead.
> 
> But if you do want to improve that, it may be better to change the
> accumulating method, rather than doing the double-read thing. I"ve
> written, and posted here, code that provides an Acumulator that uses
> numpy internally, so not much memory overhead. In the end, it's not
> any faster than accumulating in a list and then converting to an
> array, but it does use less memory.
> 
> I also have a Cython version that is not quite done (darn regular job
> getting in the way) that is both faster and more memory efficient.
> 
> Also, frankly, just writing the array pre-allocation and re-sizeing
> code into loadtxt would not be a whole lot of code either, and would
> be both fast and memory efficient.
> 
> Let mw know if you want any of my code to play with.
> 
> >  However, I got the impression that someone was
> > working on a More Advanced (TM) C-based file reader, which will
> > replace loadtxt;
> 
> yes -- I wonder what happened with that? Anyone?
> 
> -CHB
> 
> 
> 
> this patch is intended as a useful thing to have
> > while we're waiting for that to appear.
> >
> > The patch passes all tests in the test suite, and documentation for
> > the kwarg has been added. I've modified all tests to include the
> > seekable kwarg, but that was mostly to check that all tests are passed
> > also with this kwarg. I guess it's bit too late for 1.7.0 though?
> >
> > Should I make a pull request? I'm happy to take any and all
> > suggestions before I do.
> >
> > Cheers
> > Paul
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 
> 
> 
> --
> 
> Christopher Barker, Ph.D.
> Oceanographer
> 
> Emergency Response Division
> NOAA/NOS/OR&R            (206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115       (206) 526-6317   main reception
> 
> chris.bar...@noaa.gov
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 
> I've finally built a new, very fast C-based tokenizer/parser with type 
> inference, NA-handling, etc. for pandas sporadically over the last month-- 
> it's almost ready to ship. It's roughly an order of magnitude faster than 
> loadtxt and uses very little temporary space. Should be easy to push upstream 
> into NumPy to replace the innards of np.loadtxt if I can get a bit of help 
> with the plumbing (it already yields structured arrays in addition to pandas 
> DataFrames so there isn't a great deal that needs doing). 
> 
> Blog post with CPU and memory benchmarks to follow-- will post a link here. 
> 
> - Wes
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion



So Chris, looks like Wes has us beaten in every conceivable way. Hey, that's a 
good thing :)  I suppose the thing to do now is to make sure Wes' function 
tackles the loadtxt test suite?

Paul

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] memory-efficient loadtxt

Reply via email to