On Mon, Dec 12, 2011 at 12:34 PM, Warren Weckesser <[email protected]> wrote: > > > On Mon, Dec 12, 2011 at 10:22 AM, Chris.Barker <[email protected]> > wrote: >> >> On 12/11/11 8:40 AM, Ralf Gommers wrote: >> > On Wed, Dec 7, 2011 at 7:50 PM, Chris.Barker <[email protected] >> > * If we have a good, fast ascii (or unicode?) to array reader, >> > hopefully >> > it could be leveraged for use in the more complex cases. So that >> > rather >> > than genfromtxt() being written from scratch, it would be a wrapper >> > around the lower-level reader. >> > >> > You seem to be contradicting yourself here. The more complex cases are >> > Wes' 10% and why genfromtxt is so hairy internally. There's always a >> > trade-off between speed and handling complex corner cases. You want >> > both. >> >> I don't think the version in my mind is contradictory (Not quite). >> >> What I'm imagining is that a good, fast ascii to numpy array reader >> could read a whole table in at once (the common, easy, fast, case), but >> it could also be used to read snippets of a file in at a time, which >> could be leveraged to handle many of the more complex cases. >> >> I suppose there will always be cases where the user needs to write their >> own converter from string to dtype, and there is simply no way to >> leverage what I'm imagining to supported that. >> >> Hmm, maybe there is -- for instance, if a "record" consisted off mostly >> standard, easy-to-parse, numbers, but one field was some weird text that >> needed custom parsing, we could read it as a dtype, with a string for >> that one weird field, and that could be converted in a post-processing >> step. >> >> Maybe that wouldn't be any faster or easier, but it could be done... >> >> Anyway, whether you can leverage it for the full-featured version or >> not, I do think there is call for a good, fast, 90% case text file parser. >> >> >> Would anyone like to join/form a small working group to work on this? >> >> Wes, I'd like to see your Cython version -- maybe a starting point? >> >> -Chris > > > > I'm also working on a faster text file reader, so count me in. I've been > experimenting in both C and Cython. I'll put it on github as soon as I > can. > > Warren > > >> >> >> >> -- >> Christopher Barker, Ph.D. >> Oceanographer >> >> Emergency Response Division >> NOAA/NOS/OR&R (206) 526-6959 voice >> 7600 Sand Point Way NE (206) 526-6329 fax >> Seattle, WA 98115 (206) 526-6317 main reception >> >> [email protected] >> _______________________________________________ >> NumPy-Discussion mailing list >> [email protected] >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > [email protected] > http://mail.scipy.org/mailman/listinfo/numpy-discussion >
Cool, Warren, I look forward to seeing it. I'm hopeful we can craft a performant tool that will meet the needs of of many projects (NumPy, pandas, etc.)... _______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
