My suspicion is you should read into a 1d vector (and use `append!`), then at the end do a reshape and finally a transpose. I bet that will be many times faster than any other alternative, because we have a really fast transpose now.
The only disadvantage I see is taking twice as much memory as would be minimally needed. (This can be fixed once we have row-major arrays.) --Tim On Monday, December 08, 2014 08:38:06 AM John Myles White wrote: > I believe/hope the proposed solution will work for most cases, although > there's still a bunch of performance work left to be done. I think the > decoupling problem isn't as hard as it might seem since there are very > clearly distinct stages in parsing a CSV file. But we'll find out if the > indirection I've introduced causes performance problems when things can't > be inlined. > > While writing this package, I found the two most challenging problems to be: > > (A) The disconnect between CSV files providing one row at a time and Julia's > usage of column major arrays, which encourage reading one column at a time. > (B) The inability to easily resize! a matrix. > > -- John > > On Dec 8, 2014, at 5:16 AM, Stefan Karpinski <[email protected]> wrote: > > Doh. Obfuscate the code quick, before anyone uses it! This is very nice > > and something I've always felt like we need for data formats like CSV – a > > way of decoupling the parsing of the format from the populating of a data > > structure with that data. It's a tough problem. > > > > On Mon, Dec 8, 2014 at 8:08 AM, Tom Short <[email protected]> wrote: > > Exciting, John! Although your documentation may be "very sparse", the code > > is nicely documented. > > > > On Mon, Dec 8, 2014 at 12:35 AM, John Myles White > > <[email protected]> wrote: Over the last month or so, I've been > > slowly working on a new library that defines an abstract toolkit for > > writing CSV parsers. The goal is to provide an abstract interface that > > users can implement in order to provide functions for reading data into > > their preferred data structures from CSV files. In principle, this > > approach should allow us to unify the code behind Base's readcsv and > > DataFrames's readtable functions. > > > > The library is still very much a work-in-progress, but I wanted to let > > others see what I've done so that I can start getting feedback on the > > design. > > > > Because the library makes heavy use of Nullables, you can only try out the > > library on Julia 0.4. If you're interested, it's available at > > https://github.com/johnmyleswhite/CSVReaders.jl > > > > For now, I've intentionally given very sparse documentation to discourage > > people from seriously using the library before it's officially released. > > But there are some examples in the README that should make clear how the > > library is intended to be used.> > > -- John
