Re: [julia-users] [WIP] CSVReaders.jl

Tim Holy Mon, 08 Dec 2014 10:30:07 -0800

Right, indeed I meant to suggest making the conversion to matrix form the very 
last step of the process. But obviously you didn't need that suggestion :-).


--Tim

On Monday, December 08, 2014 10:20:00 AM John Myles White wrote:
> Looking at this again, the problem with doing reshape/transpose is that it's
> very awkward when trying to read data in a stream, since you need to undo
> the reshape and transpose before starting to read from the stream again. I
> think the best solution to getting a row-major matrix of data is to add a
> wrapper around the readall method from this package that handles the final
> reshape and transpose operations when you're not reading in streaming data.
> 
>  -- John
> 
> On Dec 8, 2014, at 9:25 AM, Tim Holy <[email protected]> wrote:
> > Does the reshape/transpose really take any appreciable time (compared to
> > the I/O)?
> > 
> > --Tim
> > 
> > On Monday, December 08, 2014 09:14:35 AM John Myles White wrote:
> >> Yes, this is how I've been doing things so far.
> >> 
> >> -- John
> >> 
> >> On Dec 8, 2014, at 9:12 AM, Tim Holy <[email protected]> wrote:
> >>> My suspicion is you should read into a 1d vector (and use `append!`),
> >>> then
> >>> at the end do a reshape and finally a transpose. I bet that will be many
> >>> times faster than any other alternative, because we have a really fast
> >>> transpose now.
> >>> 
> >>> The only disadvantage I see is taking twice as much memory as would be
> >>> minimally needed. (This can be fixed once we have row-major arrays.)
> >>> 
> >>> --Tim
> >>> 
> >>> On Monday, December 08, 2014 08:38:06 AM John Myles White wrote:
> >>>> I believe/hope the proposed solution will work for most cases, although
> >>>> there's still a bunch of performance work left to be done. I think the
> >>>> decoupling problem isn't as hard as it might seem since there are very
> >>>> clearly distinct stages in parsing a CSV file. But we'll find out if
> >>>> the
> >>>> indirection I've introduced causes performance problems when things
> >>>> can't
> >>>> be inlined.
> >>>> 
> >>>> While writing this package, I found the two most challenging problems
> >>>> to
> >>>> be:
> >>>> 
> >>>> (A) The disconnect between CSV files providing one row at a time and
> >>>> Julia's usage of column major arrays, which encourage reading one
> >>>> column
> >>>> at a time. (B) The inability to easily resize! a matrix.
> >>>> 
> >>>> -- John
> >>>> 
> >>>> On Dec 8, 2014, at 5:16 AM, Stefan Karpinski <[email protected]>
> > 
> > wrote:
> >>>>> Doh. Obfuscate the code quick, before anyone uses it! This is very
> >>>>> nice
> >>>>> and something I've always felt like we need for data formats like CSV
> >>>>> –
> >>>>> a
> >>>>> way of decoupling the parsing of the format from the populating of a
> >>>>> data
> >>>>> structure with that data. It's a tough problem.
> >>>>> 
> >>>>> On Mon, Dec 8, 2014 at 8:08 AM, Tom Short <[email protected]>
> >>>>> wrote:
> >>>>> Exciting, John! Although your documentation may be "very sparse", the
> >>>>> code
> >>>>> is nicely documented.
> >>>>> 
> >>>>> On Mon, Dec 8, 2014 at 12:35 AM, John Myles White
> >>>>> <[email protected]> wrote: Over the last month or so, I've been
> >>>>> slowly working on a new library that defines an abstract toolkit for
> >>>>> writing CSV parsers. The goal is to provide an abstract interface that
> >>>>> users can implement in order to provide functions for reading data
> >>>>> into
> >>>>> their preferred data structures from CSV files. In principle, this
> >>>>> approach should allow us to unify the code behind Base's readcsv and
> >>>>> DataFrames's readtable functions.
> >>>>> 
> >>>>> The library is still very much a work-in-progress, but I wanted to let
> >>>>> others see what I've done so that I can start getting feedback on the
> >>>>> design.
> >>>>> 
> >>>>> Because the library makes heavy use of Nullables, you can only try out
> >>>>> the
> >>>>> library on Julia 0.4. If you're interested, it's available at
> >>>>> https://github.com/johnmyleswhite/CSVReaders.jl
> >>>>> 
> >>>>> For now, I've intentionally given very sparse documentation to
> >>>>> discourage
> >>>>> people from seriously using the library before it's officially
> >>>>> released.
> >>>>> But there are some examples in the README that should make clear how
> >>>>> the
> >>>>> library is intended to be used.>
> >>>>> -- John

Re: [julia-users] [WIP] CSVReaders.jl

Reply via email to