Re: [julia-users] [WIP] CSVReaders.jl

Iain Dunning Mon, 08 Dec 2014 10:42:47 -0800

Tried it out (built Julia 0.4 just to do it!), made a CSV-to-JSON type 
thing:


https://github.com/johnmyleswhite/CSVReaders.jl/issues/1

Quite excited about this - I find myself writing code that basically 
mangles a row into a type pretty often.
In fact, 90% of my needs would be satisfied by a variant of readall that 
takes a type, reads a row, and calls a function like 

function readrow(::Type{T}, values::Vector{Any})
  # ...
  return T(...)
end

and returns a Vector{T}.

Not sure how that fits in with the design of this.

Cheers,
Iain


On Monday, December 8, 2014 1:29:46 PM UTC-5, Tim Holy wrote:
>
> Right, indeed I meant to suggest making the conversion to matrix form the 
> very 
> last step of the process. But obviously you didn't need that suggestion 
> :-). 
>
> --Tim 
>
> On Monday, December 08, 2014 10:20:00 AM John Myles White wrote: 
> > Looking at this again, the problem with doing reshape/transpose is that 
> it's 
> > very awkward when trying to read data in a stream, since you need to 
> undo 
> > the reshape and transpose before starting to read from the stream again. 
> I 
> > think the best solution to getting a row-major matrix of data is to add 
> a 
> > wrapper around the readall method from this package that handles the 
> final 
> > reshape and transpose operations when you're not reading in streaming 
> data. 
> > 
> >  -- John 
> > 
> > On Dec 8, 2014, at 9:25 AM, Tim Holy <[email protected] <javascript:>> 
> wrote: 
> > > Does the reshape/transpose really take any appreciable time (compared 
> to 
> > > the I/O)? 
> > > 
> > > --Tim 
> > > 
> > > On Monday, December 08, 2014 09:14:35 AM John Myles White wrote: 
> > >> Yes, this is how I've been doing things so far. 
> > >> 
> > >> -- John 
> > >> 
> > >> On Dec 8, 2014, at 9:12 AM, Tim Holy <[email protected] <javascript:>> 
> wrote: 
> > >>> My suspicion is you should read into a 1d vector (and use 
> `append!`), 
> > >>> then 
> > >>> at the end do a reshape and finally a transpose. I bet that will be 
> many 
> > >>> times faster than any other alternative, because we have a really 
> fast 
> > >>> transpose now. 
> > >>> 
> > >>> The only disadvantage I see is taking twice as much memory as would 
> be 
> > >>> minimally needed. (This can be fixed once we have row-major arrays.) 
> > >>> 
> > >>> --Tim 
> > >>> 
> > >>> On Monday, December 08, 2014 08:38:06 AM John Myles White wrote: 
> > >>>> I believe/hope the proposed solution will work for most cases, 
> although 
> > >>>> there's still a bunch of performance work left to be done. I think 
> the 
> > >>>> decoupling problem isn't as hard as it might seem since there are 
> very 
> > >>>> clearly distinct stages in parsing a CSV file. But we'll find out 
> if 
> > >>>> the 
> > >>>> indirection I've introduced causes performance problems when things 
> > >>>> can't 
> > >>>> be inlined. 
> > >>>> 
> > >>>> While writing this package, I found the two most challenging 
> problems 
> > >>>> to 
> > >>>> be: 
> > >>>> 
> > >>>> (A) The disconnect between CSV files providing one row at a time 
> and 
> > >>>> Julia's usage of column major arrays, which encourage reading one 
> > >>>> column 
> > >>>> at a time. (B) The inability to easily resize! a matrix. 
> > >>>> 
> > >>>> -- John 
> > >>>> 
> > >>>> On Dec 8, 2014, at 5:16 AM, Stefan Karpinski <[email protected] 
> <javascript:>> 
> > > 
> > > wrote: 
> > >>>>> Doh. Obfuscate the code quick, before anyone uses it! This is very 
> > >>>>> nice 
> > >>>>> and something I've always felt like we need for data formats like 
> CSV 
> > >>>>> – 
> > >>>>> a 
> > >>>>> way of decoupling the parsing of the format from the populating of 
> a 
> > >>>>> data 
> > >>>>> structure with that data. It's a tough problem. 
> > >>>>> 
> > >>>>> On Mon, Dec 8, 2014 at 8:08 AM, Tom Short <[email protected] 
> <javascript:>> 
> > >>>>> wrote: 
> > >>>>> Exciting, John! Although your documentation may be "very sparse", 
> the 
> > >>>>> code 
> > >>>>> is nicely documented. 
> > >>>>> 
> > >>>>> On Mon, Dec 8, 2014 at 12:35 AM, John Myles White 
> > >>>>> <[email protected] <javascript:>> wrote: Over the last month 
> or so, I've been 
> > >>>>> slowly working on a new library that defines an abstract toolkit 
> for 
> > >>>>> writing CSV parsers. The goal is to provide an abstract interface 
> that 
> > >>>>> users can implement in order to provide functions for reading data 
> > >>>>> into 
> > >>>>> their preferred data structures from CSV files. In principle, this 
> > >>>>> approach should allow us to unify the code behind Base's readcsv 
> and 
> > >>>>> DataFrames's readtable functions. 
> > >>>>> 
> > >>>>> The library is still very much a work-in-progress, but I wanted to 
> let 
> > >>>>> others see what I've done so that I can start getting feedback on 
> the 
> > >>>>> design. 
> > >>>>> 
> > >>>>> Because the library makes heavy use of Nullables, you can only try 
> out 
> > >>>>> the 
> > >>>>> library on Julia 0.4. If you're interested, it's available at 
> > >>>>> https://github.com/johnmyleswhite/CSVReaders.jl 
> > >>>>> 
> > >>>>> For now, I've intentionally given very sparse documentation to 
> > >>>>> discourage 
> > >>>>> people from seriously using the library before it's officially 
> > >>>>> released. 
> > >>>>> But there are some examples in the README that should make clear 
> how 
> > >>>>> the 
> > >>>>> library is intended to be used.> 
> > >>>>> -- John 
>
>

Re: [julia-users] [WIP] CSVReaders.jl

Reply via email to