dear tim, lex, todd (&others):  thanks for responding.  I really want
to learn how to preprocess input from somewhere else into the
readcsv() function.  it's a good starting exercise for me to learn how
to accomplish tasks in general.  there is so much to learn.  [I did
not experiment with GZip.jl --- modules are new to me, and this one is
not included.  I could make too many errors in this process.  It will
probably make the specific task easier.]

now, the first mistake which tripped me up for a while is that I did
not grasp the difference between a string and a command.  that is, I
should not have used " for my command.  I had needed to use `.  this
is why open("echo hi") did not work, but open(`echo hi`) does.

    x=open(`gzcat myfile.csv.gz`)

is a good start.  I see it contains a tuple of a Pipe and a Process.
this is printed by default on the command line.  I learned I can make
this work with

   d=readcsv( x[1] )

but I have a whole bunch of new questions, beyond question now.
first, try this:

julia> x1=open(`gzcat d.csv.gz`)
(Pipe(closed, 35 bytes waiting),Process(`gzcat d.csv.gz`, ProcessExited(0)))

julia> x2=open(`gzcat d.csv.gz`)
(Pipe(active, 0 bytes waiting),Process(`gzcat d.csv.gz`, ProcessRunning))

how strange---the claims are different.  even stranger, the first
readcsv(x2[1]) is very slow now (I am talking 3 seconds on a 3 by 4
data file!); but following it with readcsv(x1[1]) is fast.  I can't
imagine readcsv has intelligence built-in to cache past specific
conversions.

another strange definition from a novice perspective:  close(x1) is
not defined.  close(x1[1]) is.  julia is the first language I have
seen where a close(open("file")) is wrong.  this is esp surprising
because julia has the dispatch ability to understand what it could do
with a close(Pipe,Process) tuple.  the same holds true for other
functions that expect a part of open.  julia should be smart enough to
know this.

regards,

/iaw

----
Ivo Welch ([email protected])
http://www.ivo-welch.info/
J. Fred Weston Distinguished Professor of Finance
Anderson School at UCLA, C519
Director, UCLA Anderson Fink Center for Finance and Investments
Free Finance Textbook, http://book.ivo-welch.info/
Exec Editor, Critical Finance Review, http://www.critical-finance-review.org/
Editor and Publisher, FAMe, http://www.fame-jagazine.com/


On Sun, Jan 4, 2015 at 6:29 PM, Todd Leo <[email protected]> wrote:
> An intuitive thought is, uncompress your csv file via bash utility zcat,
> pipe it to STDIN and use readline(STDIN) in julia.
>
>
>
> On Monday, January 5, 2015 7:51:18 AM UTC+8, ivo welch wrote:
>>
>>
>> dear julia users:  beginner's question (apologies, more will be coming).
>> it's probably obvious.
>>
>> I am storing files in compressed csv form.  I want to use the built-in
>> julia readcsv() function.  but I also need to pipe through a decompressor
>> first.  so, I tried a variety of forms, like
>>
>>    d= readcsv("/usr/bin/gzcat ./myfile.csv.gz |")
>>    d= readcsv("`/usr/bin/gzcat ./myfile.csv.gz`")
>>
>> I can type the file with run(`/usr/bin/gzcat ./crsp90.csv.gz"), but
>> wrapping a readcsv around it does not capture it.  how does one do this?
>>
>> regards,
>>
>> /iaw
>>
>

Reply via email to