Re: [julia-users] [WIP] CSVReaders.jl

John Myles White Mon, 08 Dec 2014 09:15:04 -0800

Yes, this is how I've been doing things so far.

 -- John


On Dec 8, 2014, at 9:12 AM, Tim Holy <[email protected]> wrote:

> My suspicion is you should read into a 1d vector (and use `append!`), then at 
> the end do a reshape and finally a transpose. I bet that will be many times 
> faster than any other alternative, because we have a really fast transpose 
> now.
> 
> The only disadvantage I see is taking twice as much memory as would be 
> minimally needed. (This can be fixed once we have row-major arrays.)
> 
> --Tim
> 
> On Monday, December 08, 2014 08:38:06 AM John Myles White wrote:
>> I believe/hope the proposed solution will work for most cases, although
>> there's still a bunch of performance work left to be done. I think the
>> decoupling problem isn't as hard as it might seem since there are very
>> clearly distinct stages in parsing a CSV file. But we'll find out if the
>> indirection I've introduced causes performance problems when things can't
>> be inlined.
>> 
>> While writing this package, I found the two most challenging problems to be:
>> 
>> (A) The disconnect between CSV files providing one row at a time and Julia's
>> usage of column major arrays, which encourage reading one column at a time.
>> (B) The inability to easily resize! a matrix.
>> 
>> -- John
>> 
>> On Dec 8, 2014, at 5:16 AM, Stefan Karpinski <[email protected]> wrote:
>>> Doh. Obfuscate the code quick, before anyone uses it! This is very nice
>>> and something I've always felt like we need for data formats like CSV – a
>>> way of decoupling the parsing of the format from the populating of a data
>>> structure with that data. It's a tough problem.
>>> 
>>> On Mon, Dec 8, 2014 at 8:08 AM, Tom Short <[email protected]> wrote:
>>> Exciting, John! Although your documentation may be "very sparse", the code
>>> is nicely documented.
>>> 
>>> On Mon, Dec 8, 2014 at 12:35 AM, John Myles White
>>> <[email protected]> wrote: Over the last month or so, I've been
>>> slowly working on a new library that defines an abstract toolkit for
>>> writing CSV parsers. The goal is to provide an abstract interface that
>>> users can implement in order to provide functions for reading data into
>>> their preferred data structures from CSV files. In principle, this
>>> approach should allow us to unify the code behind Base's readcsv and
>>> DataFrames's readtable functions.
>>> 
>>> The library is still very much a work-in-progress, but I wanted to let
>>> others see what I've done so that I can start getting feedback on the
>>> design.
>>> 
>>> Because the library makes heavy use of Nullables, you can only try out the
>>> library on Julia 0.4. If you're interested, it's available at
>>> https://github.com/johnmyleswhite/CSVReaders.jl
>>> 
>>> For now, I've intentionally given very sparse documentation to discourage
>>> people from seriously using the library before it's officially released.
>>> But there are some examples in the README that should make clear how the
>>> library is intended to be used.> 
>>> -- John
>

Re: [julia-users] [WIP] CSVReaders.jl

Reply via email to