Iain, I didn't implement that function because it's pretty wasteful of memory. Instead there's a non-public function readnrows() that I'll make public, which allows you to do incremental reading. The thing to keep in mind is that incremental reading is tricky: you need to deal with the header and other things (like what happens if you see a blank row). That's why I've currently kept that functionality private.
-- John On Dec 8, 2014, at 1:42 PM, Iain Dunning <[email protected]> wrote: > Tried it out (built Julia 0.4 just to do it!), made a CSV-to-JSON type thing: > > https://github.com/johnmyleswhite/CSVReaders.jl/issues/1 > > Quite excited about this - I find myself writing code that basically mangles > a row into a type pretty often. > In fact, 90% of my needs would be satisfied by a variant of readall that > takes a type, reads a row, and calls a function like > > function readrow(::Type{T}, values::Vector{Any}) > # ... > return T(...) > end > > and returns a Vector{T}. > > Not sure how that fits in with the design of this. > > Cheers, > Iain > > > On Monday, December 8, 2014 1:29:46 PM UTC-5, Tim Holy wrote: > Right, indeed I meant to suggest making the conversion to matrix form the > very > last step of the process. But obviously you didn't need that suggestion :-). > > --Tim > > On Monday, December 08, 2014 10:20:00 AM John Myles White wrote: > > Looking at this again, the problem with doing reshape/transpose is that > > it's > > very awkward when trying to read data in a stream, since you need to undo > > the reshape and transpose before starting to read from the stream again. I > > think the best solution to getting a row-major matrix of data is to add a > > wrapper around the readall method from this package that handles the final > > reshape and transpose operations when you're not reading in streaming data. > > > > -- John > > > > On Dec 8, 2014, at 9:25 AM, Tim Holy <[email protected]> wrote: > > > Does the reshape/transpose really take any appreciable time (compared to > > > the I/O)? > > > > > > --Tim > > > > > > On Monday, December 08, 2014 09:14:35 AM John Myles White wrote: > > >> Yes, this is how I've been doing things so far. > > >> > > >> -- John > > >> > > >> On Dec 8, 2014, at 9:12 AM, Tim Holy <[email protected]> wrote: > > >>> My suspicion is you should read into a 1d vector (and use `append!`), > > >>> then > > >>> at the end do a reshape and finally a transpose. I bet that will be > > >>> many > > >>> times faster than any other alternative, because we have a really fast > > >>> transpose now. > > >>> > > >>> The only disadvantage I see is taking twice as much memory as would be > > >>> minimally needed. (This can be fixed once we have row-major arrays.) > > >>> > > >>> --Tim > > >>> > > >>> On Monday, December 08, 2014 08:38:06 AM John Myles White wrote: > > >>>> I believe/hope the proposed solution will work for most cases, > > >>>> although > > >>>> there's still a bunch of performance work left to be done. I think the > > >>>> decoupling problem isn't as hard as it might seem since there are very > > >>>> clearly distinct stages in parsing a CSV file. But we'll find out if > > >>>> the > > >>>> indirection I've introduced causes performance problems when things > > >>>> can't > > >>>> be inlined. > > >>>> > > >>>> While writing this package, I found the two most challenging problems > > >>>> to > > >>>> be: > > >>>> > > >>>> (A) The disconnect between CSV files providing one row at a time and > > >>>> Julia's usage of column major arrays, which encourage reading one > > >>>> column > > >>>> at a time. (B) The inability to easily resize! a matrix. > > >>>> > > >>>> -- John > > >>>> > > >>>> On Dec 8, 2014, at 5:16 AM, Stefan Karpinski <[email protected]> > > > > > > wrote: > > >>>>> Doh. Obfuscate the code quick, before anyone uses it! This is very > > >>>>> nice > > >>>>> and something I've always felt like we need for data formats like CSV > > >>>>> – > > >>>>> a > > >>>>> way of decoupling the parsing of the format from the populating of a > > >>>>> data > > >>>>> structure with that data. It's a tough problem. > > >>>>> > > >>>>> On Mon, Dec 8, 2014 at 8:08 AM, Tom Short <[email protected]> > > >>>>> wrote: > > >>>>> Exciting, John! Although your documentation may be "very sparse", the > > >>>>> code > > >>>>> is nicely documented. > > >>>>> > > >>>>> On Mon, Dec 8, 2014 at 12:35 AM, John Myles White > > >>>>> <[email protected]> wrote: Over the last month or so, I've been > > >>>>> slowly working on a new library that defines an abstract toolkit for > > >>>>> writing CSV parsers. The goal is to provide an abstract interface > > >>>>> that > > >>>>> users can implement in order to provide functions for reading data > > >>>>> into > > >>>>> their preferred data structures from CSV files. In principle, this > > >>>>> approach should allow us to unify the code behind Base's readcsv and > > >>>>> DataFrames's readtable functions. > > >>>>> > > >>>>> The library is still very much a work-in-progress, but I wanted to > > >>>>> let > > >>>>> others see what I've done so that I can start getting feedback on the > > >>>>> design. > > >>>>> > > >>>>> Because the library makes heavy use of Nullables, you can only try > > >>>>> out > > >>>>> the > > >>>>> library on Julia 0.4. If you're interested, it's available at > > >>>>> https://github.com/johnmyleswhite/CSVReaders.jl > > >>>>> > > >>>>> For now, I've intentionally given very sparse documentation to > > >>>>> discourage > > >>>>> people from seriously using the library before it's officially > > >>>>> released. > > >>>>> But there are some examples in the README that should make clear how > > >>>>> the > > >>>>> library is intended to be used.> > > >>>>> -- John >
