Re: [julia-users] Re: reading compressed csv file?

elextr Sun, 04 Jan 2015 23:47:57 -0800


On Monday, January 5, 2015 4:46:15 PM UTC+10, ivo welch wrote:
>
> dear tim, lex, todd (&others):  thanks for responding.  I really want 
> to learn how to preprocess input from somewhere else into the 
> readcsv() function.  it's a good starting exercise for me to learn how 
> to accomplish tasks in general.  there is so much to learn.  [I did 
> not experiment with GZip.jl --- modules are new to me, and this one is 
> not included.  I could make too many errors in this process.  It will 
> probably make the specific task easier.] 
>
> now, the first mistake which tripped me up for a while is that I did 
> not grasp the difference between a string and a command.  that is, I 
> should not have used " for my command.  I had needed to use `.  this 
> is why open("echo hi") did not work, but open(`echo hi`) does. 
>


Yep correct.
 

>
>     x=open(`gzcat myfile.csv.gz`) 
>
> is a good start.  I see it contains a tuple of a Pipe and a Process. 
> this is printed by default on the command line.  I learned I can make 
> this work with 
>
>    d=readcsv( x[1] ) 
>

Yes
 

>
> but I have a whole bunch of new questions, beyond question now. 
> first, try this: 
>
> julia> x1=open(`gzcat d.csv.gz`) 
> (Pipe(closed, 35 bytes waiting),Process(`gzcat d.csv.gz`, 
> ProcessExited(0))) 
>
> julia> x2=open(`gzcat d.csv.gz`) 
> (Pipe(active, 0 bytes waiting),Process(`gzcat d.csv.gz`, ProcessRunning)) 
>
> how strange---the claims are different.  


That may just be sampling effect, the gzcat is being run in another process 
so it runs at the same time as the current process.  Also see below for why 
the first call to open(command) may have been slower than the second and so 
the open has not completed until after the other process completed, but ran 
much faster the second time and beat the other process.
 

> even stranger, the first 
> readcsv(x2[1]) is very slow now (I am talking 3 seconds on a 3 by 4 
> data file!); but following it with readcsv(x1[1]) is fast.  I can't 
> imagine readcsv has intelligence built-in to cache past specific 
> conversions. 
>

No but the first time you do anything its possible that you are hitting 
compile delays from the JIT (of open and readcsv and all its dependents), 
subsequent runs are faster. 
 

>
> another strange definition from a novice perspective:  close(x1) is 
> not defined.  close(x1[1]) is.  


close() is defined for a stream, not a tuple (stream, process).
 

> julia is the first language I have 
> seen where a close(open("file")) is wrong. 


close(open("filenamestring")) is fine, close(open(command)) is not because 
open(command) returns a tuple of two things, not just the stream.  This is 
Julia's primary paradigm, multi-dispatch means that the same named function 
can have several methods that do different things depending on the *type* 
of the arguments to the call, string or command.
 

>  this is esp surprising 
> because julia has the dispatch ability to understand what it could do 
> with a close(Pipe,Process) tuple. 


But only if such a close() method is defined, which it is not.  Maybe it 
should be, but open(command) is significantly less used than open(file).

Cheers
Lex

 

>  the same holds true for other 
> functions that expect a part of open.  julia should be smart enough to 
> know this. 
>
> regards, 
>
> /iaw 
>
> ---- 
> Ivo Welch ([email protected] <javascript:>) 
> http://www.ivo-welch.info/ 
> J. Fred Weston Distinguished Professor of Finance 
> Anderson School at UCLA, C519 
> Director, UCLA Anderson Fink Center for Finance and Investments 
> Free Finance Textbook, http://book.ivo-welch.info/ 
> Exec Editor, Critical Finance Review, 
> http://www.critical-finance-review.org/ 
> Editor and Publisher, FAMe, http://www.fame-jagazine.com/ 
>
>
> On Sun, Jan 4, 2015 at 6:29 PM, Todd Leo <[email protected] 
> <javascript:>> wrote: 
> > An intuitive thought is, uncompress your csv file via bash utility zcat, 
> > pipe it to STDIN and use readline(STDIN) in julia. 
> > 
> > 
> > 
> > On Monday, January 5, 2015 7:51:18 AM UTC+8, ivo welch wrote: 
> >> 
> >> 
> >> dear julia users:  beginner's question (apologies, more will be 
> coming). 
> >> it's probably obvious. 
> >> 
> >> I am storing files in compressed csv form.  I want to use the built-in 
> >> julia readcsv() function.  but I also need to pipe through a 
> decompressor 
> >> first.  so, I tried a variety of forms, like 
> >> 
> >>    d= readcsv("/usr/bin/gzcat ./myfile.csv.gz |") 
> >>    d= readcsv("`/usr/bin/gzcat ./myfile.csv.gz`") 
> >> 
> >> I can type the file with run(`/usr/bin/gzcat ./crsp90.csv.gz"), but 
> >> wrapping a readcsv around it does not capture it.  how does one do 
> this? 
> >> 
> >> regards, 
> >> 
> >> /iaw 
> >> 
> > 
>

Re: [julia-users] Re: reading compressed csv file?

Reply via email to