On 3. okt. 2012, at 17:48, Wes McKinney wrote: > On Monday, October 1, 2012, Chris Barker wrote: > Paul, > > Nice to see someone working on these issues, but: > > I'm not sure the problem you are trying to solve -- accumulating in a > list is pretty efficient anyway -- not a whole lot overhead. > > But if you do want to improve that, it may be better to change the > accumulating method, rather than doing the double-read thing. I"ve > written, and posted here, code that provides an Acumulator that uses > numpy internally, so not much memory overhead. In the end, it's not > any faster than accumulating in a list and then converting to an > array, but it does use less memory. > > I also have a Cython version that is not quite done (darn regular job > getting in the way) that is both faster and more memory efficient. > > Also, frankly, just writing the array pre-allocation and re-sizeing > code into loadtxt would not be a whole lot of code either, and would > be both fast and memory efficient. > > Let mw know if you want any of my code to play with. > > > However, I got the impression that someone was > > working on a More Advanced (TM) C-based file reader, which will > > replace loadtxt; > > yes -- I wonder what happened with that? Anyone? > > -CHB > > > > this patch is intended as a useful thing to have > > while we're waiting for that to appear. > > > > The patch passes all tests in the test suite, and documentation for > > the kwarg has been added. I've modified all tests to include the > > seekable kwarg, but that was mostly to check that all tests are passed > > also with this kwarg. I guess it's bit too late for 1.7.0 though? > > > > Should I make a pull request? I'm happy to take any and all > > suggestions before I do. > > > > Cheers > > Paul > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion@scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > chris.bar...@noaa.gov > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > I've finally built a new, very fast C-based tokenizer/parser with type > inference, NA-handling, etc. for pandas sporadically over the last month-- > it's almost ready to ship. It's roughly an order of magnitude faster than > loadtxt and uses very little temporary space. Should be easy to push upstream > into NumPy to replace the innards of np.loadtxt if I can get a bit of help > with the plumbing (it already yields structured arrays in addition to pandas > DataFrames so there isn't a great deal that needs doing). > > Blog post with CPU and memory benchmarks to follow-- will post a link here. > > - Wes > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion
So Chris, looks like Wes has us beaten in every conceivable way. Hey, that's a good thing :) I suppose the thing to do now is to make sure Wes' function tackles the loadtxt test suite? Paul _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion