I have a 25Mb CSV text file that I want to process.  Simply running
(time (dorun (read-lines "file"))) gives me about 1 second of read
time, which is about as fast as you'll get (on my machine) I think.
I believe that it should be possible to overlap the IO cost of reading
from a file with processing cost, so that I should be able to do
almost 1 second of processing on the data entirely in parallel.  But I
can't do it!

I was trying things like
(let [lines (read-lines "file")]
 (future (dorun lines)) ; pre-fetch lines in the background
 (time (dorun (map some-func lines))))

Which is a bit hacky, but should basically work in my mind.
(As an aside, how does the seq caching work?  Where in the code is it
implemented?)

But it doesn't work :( - the time it takes to map some-func across the
list is IO + compute, not (max IO-time compute-time).  If I sleep for
a while between, then the compute time goes way down.

This also leads me to think that it would be useful to have a function
that precached a lazy seq, ie
(pre-cache-seq 5 (range 1000)); returns a new lazy-seq that will keep
5 elements ahead by precaching on another thread.

Any ideas on what I'm doing wrong?

Thanks,
Brad

PS - I don't really care about 1 second of runtime, it's the concept I
care about.
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To post to this group, send email to clojure@googlegroups.com
To unsubscribe from this group, send email to 
clojure+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/clojure?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to