Tried it out (built Julia 0.4 just to do it!), made a CSV-to-JSON type thing:
https://github.com/johnmyleswhite/CSVReaders.jl/issues/1 Quite excited about this - I find myself writing code that basically mangles a row into a type pretty often. In fact, 90% of my needs would be satisfied by a variant of readall that takes a type, reads a row, and calls a function like function readrow(::Type{T}, values::Vector{Any}) # ... return T(...) end and returns a Vector{T}. Not sure how that fits in with the design of this. Cheers, Iain On Monday, December 8, 2014 1:29:46 PM UTC-5, Tim Holy wrote: > > Right, indeed I meant to suggest making the conversion to matrix form the > very > last step of the process. But obviously you didn't need that suggestion > :-). > > --Tim > > On Monday, December 08, 2014 10:20:00 AM John Myles White wrote: > > Looking at this again, the problem with doing reshape/transpose is that > it's > > very awkward when trying to read data in a stream, since you need to > undo > > the reshape and transpose before starting to read from the stream again. > I > > think the best solution to getting a row-major matrix of data is to add > a > > wrapper around the readall method from this package that handles the > final > > reshape and transpose operations when you're not reading in streaming > data. > > > > -- John > > > > On Dec 8, 2014, at 9:25 AM, Tim Holy <[email protected] <javascript:>> > wrote: > > > Does the reshape/transpose really take any appreciable time (compared > to > > > the I/O)? > > > > > > --Tim > > > > > > On Monday, December 08, 2014 09:14:35 AM John Myles White wrote: > > >> Yes, this is how I've been doing things so far. > > >> > > >> -- John > > >> > > >> On Dec 8, 2014, at 9:12 AM, Tim Holy <[email protected] <javascript:>> > wrote: > > >>> My suspicion is you should read into a 1d vector (and use > `append!`), > > >>> then > > >>> at the end do a reshape and finally a transpose. I bet that will be > many > > >>> times faster than any other alternative, because we have a really > fast > > >>> transpose now. > > >>> > > >>> The only disadvantage I see is taking twice as much memory as would > be > > >>> minimally needed. (This can be fixed once we have row-major arrays.) > > >>> > > >>> --Tim > > >>> > > >>> On Monday, December 08, 2014 08:38:06 AM John Myles White wrote: > > >>>> I believe/hope the proposed solution will work for most cases, > although > > >>>> there's still a bunch of performance work left to be done. I think > the > > >>>> decoupling problem isn't as hard as it might seem since there are > very > > >>>> clearly distinct stages in parsing a CSV file. But we'll find out > if > > >>>> the > > >>>> indirection I've introduced causes performance problems when things > > >>>> can't > > >>>> be inlined. > > >>>> > > >>>> While writing this package, I found the two most challenging > problems > > >>>> to > > >>>> be: > > >>>> > > >>>> (A) The disconnect between CSV files providing one row at a time > and > > >>>> Julia's usage of column major arrays, which encourage reading one > > >>>> column > > >>>> at a time. (B) The inability to easily resize! a matrix. > > >>>> > > >>>> -- John > > >>>> > > >>>> On Dec 8, 2014, at 5:16 AM, Stefan Karpinski <[email protected] > <javascript:>> > > > > > > wrote: > > >>>>> Doh. Obfuscate the code quick, before anyone uses it! This is very > > >>>>> nice > > >>>>> and something I've always felt like we need for data formats like > CSV > > >>>>> – > > >>>>> a > > >>>>> way of decoupling the parsing of the format from the populating of > a > > >>>>> data > > >>>>> structure with that data. It's a tough problem. > > >>>>> > > >>>>> On Mon, Dec 8, 2014 at 8:08 AM, Tom Short <[email protected] > <javascript:>> > > >>>>> wrote: > > >>>>> Exciting, John! Although your documentation may be "very sparse", > the > > >>>>> code > > >>>>> is nicely documented. > > >>>>> > > >>>>> On Mon, Dec 8, 2014 at 12:35 AM, John Myles White > > >>>>> <[email protected] <javascript:>> wrote: Over the last month > or so, I've been > > >>>>> slowly working on a new library that defines an abstract toolkit > for > > >>>>> writing CSV parsers. The goal is to provide an abstract interface > that > > >>>>> users can implement in order to provide functions for reading data > > >>>>> into > > >>>>> their preferred data structures from CSV files. In principle, this > > >>>>> approach should allow us to unify the code behind Base's readcsv > and > > >>>>> DataFrames's readtable functions. > > >>>>> > > >>>>> The library is still very much a work-in-progress, but I wanted to > let > > >>>>> others see what I've done so that I can start getting feedback on > the > > >>>>> design. > > >>>>> > > >>>>> Because the library makes heavy use of Nullables, you can only try > out > > >>>>> the > > >>>>> library on Julia 0.4. If you're interested, it's available at > > >>>>> https://github.com/johnmyleswhite/CSVReaders.jl > > >>>>> > > >>>>> For now, I've intentionally given very sparse documentation to > > >>>>> discourage > > >>>>> people from seriously using the library before it's officially > > >>>>> released. > > >>>>> But there are some examples in the README that should make clear > how > > >>>>> the > > >>>>> library is intended to be used.> > > >>>>> -- John > >
