I have a 25Mb CSV text file that I want to process. Simply running (time (dorun (read-lines "file"))) gives me about 1 second of read time, which is about as fast as you'll get (on my machine) I think. I believe that it should be possible to overlap the IO cost of reading from a file with processing cost, so that I should be able to do almost 1 second of processing on the data entirely in parallel. But I can't do it!
I was trying things like (let [lines (read-lines "file")] (future (dorun lines)) ; pre-fetch lines in the background (time (dorun (map some-func lines)))) Which is a bit hacky, but should basically work in my mind. (As an aside, how does the seq caching work? Where in the code is it implemented?) But it doesn't work :( - the time it takes to map some-func across the list is IO + compute, not (max IO-time compute-time). If I sleep for a while between, then the compute time goes way down. This also leads me to think that it would be useful to have a function that precached a lazy seq, ie (pre-cache-seq 5 (range 1000)); returns a new lazy-seq that will keep 5 elements ahead by precaching on another thread. Any ideas on what I'm doing wrong? Thanks, Brad PS - I don't really care about 1 second of runtime, it's the concept I care about. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -~----------~----~----~----~------~----~------~--~---