Re: [julia-users] Re: documentation suggestions

Ariel Katz Thu, 11 Feb 2016 13:40:41 -0800

Hello,

With regards to your specific point about CSV I/O,   there are a several 
ways to read CSV files in Julia.


- Dataframes 
<https://github.com/JuliaStats/DataFrames.jl/blob/712e3876507228552ec83a371a5d0e577c75c183/doc/source/io.rst>
.jl:

df = readtable("data.csv")


- Base:

readdlm(source, delim::Char, T::Type; options...)

- And the current state of the art  with regards to speed, CSV.jl 
<https://github.com/JuliaDB/CSV.jl> with its datastream integration.

Unless you are reading fairly large CSV files, I would stick to 
Dataframes.jl.

I would caution you though that Data I/O in Julia is still in its infancy 
and there are methods that are either slower than Python/R or missing (xls 
etc).

Zooming out a bit, I've found Data Wookie Month of Julia 
<https://github.com/DataWookie/MonthOfJulia> blog series to be the best 
getting started guide for practical data sciency Julia stuff. 

On Thursday, February 11, 2016 at 4:06:45 PM UTC-5, ivo welch wrote:
>
>
> hi doug---and vice-versa.  it's interesting that a core function (reading 
> a .csv file) would not be in a native julia library.  when are you 
> switching your students to julia?  regards,  /iaw
>
>
> ----
> Ivo Welch ([email protected] <javascript:>)
> http://www.ivo-welch.info/
> J. Fred Weston Distinguished Professor of Finance
> Anderson School at UCLA, C519
> Free Finance Textbook, http://book.ivo-welch.info/
> Exec Editor, Critical Finance Review, 
> http://www.critical-finance-review.org/
> Editor and Publisher, FAMe, http://www.fame-jagazine.com/
>
> On Thu, Feb 11, 2016 at 12:37 PM, Douglas Bates <[email protected] 
> <javascript:>> wrote:
>
>> Hi Ivo,
>>
>> Good to hear from you.
>>
>> On Wednesday, February 10, 2016 at 9:58:37 AM UTC-6, ivo welch wrote:
>>>
>>>
>>> ladies and gents---I am not (yet) a julia user.
>>>
>>> may I suggest adding more examples into two places where julia users 
>>> will face starting hurdles?
>>>
>>> [1] the I/O docs of julia.  like, reading and writing csv files that are 
>>> compressed and decompressed on-the-fly, even if not in the ultimate 
>>> efficient manner.    a large fraction of the time and frustration of new 
>>> users is consumed by the task of shoehorning data into and out of new 
>>> computer languages.  with all of R's problem, the ' d <- read.csv("f.csv")' 
>>> and 'd<-read.csv(pipe(paste("gzcat ", fname)))' reduced this entry 
>>> frustration greatly.  perhaps xml file reading and writing.  perhaps...
>>>
>>> [2] more 'standard task' programs would be great.  read a csv file, run 
>>> a regression according to variable names on the command line, print output, 
>>> draw a graph.  I know there are fragments throughout the docs, but some 
>>> section with ready to run complete programs would be good, perhaps at the 
>>> end of the manual.
>>>
>>> in a year, I hope to switch my students from R to julia.
>>>
>>
>> My main use of the RCall package is to import datasets from R into 
>> Julia.  If I have a dataset in an R package I use, e.g.
>>
>>  julia> using RCall
>>
>> julia> ds = rcopy("lme4::Dyestuff")
>> 30x2 DataFrames.DataFrame
>> | Row | Batch | Yield  |
>> |-----|-------|--------|
>> | 1   | "A"   | 1545.0 |
>> | 2   | "A"   | 1440.0 |
>> | 3   | "A"   | 1440.0 |
>> | 4   | "A"   | 1520.0 |
>> | 5   | "A"   | 1580.0 |
>> | 6   | "B"   | 1540.0 |
>> | 7   | "B"   | 1555.0 |
>> | 8   | "B"   | 1490.0 |
>> | 9   | "B"   | 1560.0 |
>> | 10  | "B"   | 1495.0 |
>> | 11  | "C"   | 1595.0 |
>> | 12  | "C"   | 1550.0 |
>> | 13  | "C"   | 1605.0 |
>> | 14  | "C"   | 1510.0 |
>> | 15  | "C"   | 1560.0 |
>> | 16  | "D"   | 1445.0 |
>> | 17  | "D"   | 1440.0 |
>> | 18  | "D"   | 1595.0 |
>> | 19  | "D"   | 1465.0 |
>> | 20  | "D"   | 1545.0 |
>> | 21  | "E"   | 1595.0 |
>> | 22  | "E"   | 1630.0 |
>> | 23  | "E"   | 1515.0 |
>> | 24  | "E"   | 1635.0 |
>> | 25  | "E"   | 1625.0 |
>> | 26  | "F"   | 1520.0 |
>> | 27  | "F"   | 1455.0 |
>> | 28  | "F"   | 1450.0 |
>> | 29  | "F"   | 1480.0 |
>> | 30  | "F"   | 1445.0 |
>>
>> If I wanted to read a CSV file using the facilities in R I could use
>>
>> julia> rcopy("read.csv('/usr/share/distro-info/debian.csv')")
>> 17x6 DataFrames.DataFrame
>> | Row | version | codename       | series         | created      | 
>> release      | eol          |
>>
>> |-----|---------|----------------|----------------|--------------|--------------|--------------|
>> | 1   | 1.1     | "Buzz"         | "buzz"         | "1993-08-16" | 
>> "1996-06-17" | "1997-06-05" |
>> | 2   | 1.2     | "Rex"          | "rex"          | "1996-06-17" | 
>> "1996-12-12" | "1998-06-05" |
>> | 3   | 1.3     | "Bo"           | "bo"           | "1996-12-12" | 
>> "1997-06-05" | "1999-03-09" |
>> | 4   | 2.0     | "Hamm"         | "hamm"         | "1997-06-05" | 
>> "1998-07-24" | "2000-03-09" |
>> | 5   | 2.1     | "Slink"        | "slink"        | "1998-07-24" | 
>> "1999-03-09" | "2000-10-30" |
>> | 6   | 2.2     | "Potato"       | "potato"       | "1999-03-09" | 
>> "2000-08-15" | "2003-07-30" |
>> | 7   | 3.0     | "Woody"        | "woody"        | "2000-08-15" | 
>> "2002-07-19" | "2006-06-30" |
>> | 8   | 3.1     | "Sarge"        | "sarge"        | "2002-07-19" | 
>> "2005-06-06" | "2008-03-30" |
>> | 9   | 4.0     | "Etch"         | "etch"         | "2005-06-06" | 
>> "2007-04-08" | "2010-02-15" |
>> | 10  | 5.0     | "Lenny"        | "lenny"        | "2007-04-08" | 
>> "2009-02-14" | "2012-02-06" |
>> | 11  | 6.0     | "Squeeze"      | "squeeze"      | "2009-02-14" | 
>> "2011-02-06" | "2014-05-31" |
>> | 12  | 7.0     | "Wheezy"       | "wheezy"       | "2011-02-06" | 
>> "2013-05-04" | ""           |
>> | 13  | 8.0     | "Jessie"       | "jessie"       | "2013-05-04" | 
>> "2015-04-25" | ""           |
>> | 14  | 9.0     | "Stretch"      | "stretch"      | "2015-04-25" | ""     
>>       | ""           |
>> | 15  | 10.0    | "Buster"       | "buster"       | "2018-07-01" | ""     
>>       | ""           |
>> | 16  | NA      | "Sid"          | "sid"          | "1993-08-16" | ""     
>>       | ""           |
>> | 17  | NA      | "Experimental" | "experimental" | "1993-08-16" | ""     
>>       | ""           |
>>
>>
>> (It turns out that R's allowing either ' or " for enclosing strings is an 
>> advantage for quoting strings within strings.)
>>
>
>

Re: [julia-users] Re: documentation suggestions

Reply via email to