dear tim, lex, todd (&others): thanks for responding. I really want
to learn how to preprocess input from somewhere else into the
readcsv() function. it's a good starting exercise for me to learn how
to accomplish tasks in general. there is so much to learn. [I did
not experiment with GZip.jl --- modules are new to me, and this one is
not included. I could make too many errors in this process. It will
probably make the specific task easier.]
now, the first mistake which tripped me up for a while is that I did
not grasp the difference between a string and a command. that is, I
should not have used " for my command. I had needed to use `. this
is why open("echo hi") did not work, but open(`echo hi`) does.
x=open(`gzcat myfile.csv.gz`)
is a good start. I see it contains a tuple of a Pipe and a Process.
this is printed by default on the command line. I learned I can make
this work with
d=readcsv( x[1] )
but I have a whole bunch of new questions, beyond question now.
first, try this:
julia> x1=open(`gzcat d.csv.gz`)
(Pipe(closed, 35 bytes waiting),Process(`gzcat d.csv.gz`, ProcessExited(0)))
julia> x2=open(`gzcat d.csv.gz`)
(Pipe(active, 0 bytes waiting),Process(`gzcat d.csv.gz`, ProcessRunning))
how strange---the claims are different. even stranger, the first
readcsv(x2[1]) is very slow now (I am talking 3 seconds on a 3 by 4
data file!); but following it with readcsv(x1[1]) is fast. I can't
imagine readcsv has intelligence built-in to cache past specific
conversions.
another strange definition from a novice perspective: close(x1) is
not defined. close(x1[1]) is. julia is the first language I have
seen where a close(open("file")) is wrong. this is esp surprising
because julia has the dispatch ability to understand what it could do
with a close(Pipe,Process) tuple. the same holds true for other
functions that expect a part of open. julia should be smart enough to
know this.
regards,
/iaw
----
Ivo Welch ([email protected])
http://www.ivo-welch.info/
J. Fred Weston Distinguished Professor of Finance
Anderson School at UCLA, C519
Director, UCLA Anderson Fink Center for Finance and Investments
Free Finance Textbook, http://book.ivo-welch.info/
Exec Editor, Critical Finance Review, http://www.critical-finance-review.org/
Editor and Publisher, FAMe, http://www.fame-jagazine.com/
On Sun, Jan 4, 2015 at 6:29 PM, Todd Leo <[email protected]> wrote:
> An intuitive thought is, uncompress your csv file via bash utility zcat,
> pipe it to STDIN and use readline(STDIN) in julia.
>
>
>
> On Monday, January 5, 2015 7:51:18 AM UTC+8, ivo welch wrote:
>>
>>
>> dear julia users: beginner's question (apologies, more will be coming).
>> it's probably obvious.
>>
>> I am storing files in compressed csv form. I want to use the built-in
>> julia readcsv() function. but I also need to pipe through a decompressor
>> first. so, I tried a variety of forms, like
>>
>> d= readcsv("/usr/bin/gzcat ./myfile.csv.gz |")
>> d= readcsv("`/usr/bin/gzcat ./myfile.csv.gz`")
>>
>> I can type the file with run(`/usr/bin/gzcat ./crsp90.csv.gz"), but
>> wrapping a readcsv around it does not capture it. how does one do this?
>>
>> regards,
>>
>> /iaw
>>
>