On Fri, Aug 17, 2012 at 10:53 PM, David Jacobs <da...@wit.io> wrote: > Okay that's great. Thanks, you guys. Was read-lines only holding onto > the head of the line seq because I bound it in the let statement?
Yea... I think so... I don't know if that's a case that the compiler's "locals clearing" handles. In any event, that's why I chose to pass the lazy sequence directly to the called function without binding it in a let first. // Ben > On Fri, Aug 17, 2012 at 11:09 AM, Ben Smith-Mannschott > <bsmith.o...@gmail.com> wrote: >> On Thu, Aug 16, 2012 at 11:47 PM, David Jacobs <da...@wit.io> wrote: >>> I'm trying to grab 5 lines by their line numbers from a large (> 1GB) file >>> with Clojure. >>> >>> So far I've got: >>> >>> (defn multi-nth [values indices] >>> (map (partial nth values) indices)) >>> >>> (defn read-lines [file indices] >>> (with-open [rdr (clojure.java.io/reader file)] >>> (let [lines (line-seq rdr)] >>> (multi-nth lines indices)))) >>> >>> Now, (read-lines "my-file" [0]) works without a problem. However, passing in >>> [0 1] gives me the following error: "java.lang.RuntimeException: >>> java.io.IOException: Stream closed" >>> >>> It seems that the stream is being closed before I can read the second line >>> from the file. Interestingly, if I manually pull out a line from the file >>> with something like `(nth lines 200)`, the `multi-nth` call works for all >>> values <= 200. >>> >>> Any idea what's going on? >>> >>> PS This question is on SO if someone wants points: >>> http://stackoverflow.com/questions/11995807/lazily-extract-lines-from-large-file >> >> The lazyness of map is biting you. The result of read-lines will not >> have been fully realized before the file is closed. Also, calling nth >> repeatedly is not going to do wonders for efficiency. Try this on for >> size: >> >> >> (ns nthlines.core >> (:require [clojure.java.io :as io])) >> >> (defn multi-nth [values indices] >> (let [matches-index? (set indices)] >> (keep-indexed #(when (matches-index? %1) %2) values))) >> >> (defn read-lines [file indices] >> (with-open [r (io/reader file)] >> (doall (multi-nth (line-seq r) indices)))) >> >> (comment >> >> (def words "/Users/bsmith/w/nthlines/words.txt") >> (def nlines 84918960) ;; 856MB with one word per line >> >> (time (read-lines words [0 1 2 (- nlines 2) (- nlines 1)])) >> >> ;;=> "Elapsed time: 18778.904 msecs" >> ;; ("A" "a" "aa" "Zyzomys" "Zyzzogeton") >> >> ) >> >> // Ben >> >> -- >> You received this message because you are subscribed to the Google >> Groups "Clojure" group. >> To post to this group, send email to clojure@googlegroups.com >> Note that posts from new members are moderated - please be patient with your >> first post. >> To unsubscribe from this group, send email to >> clojure+unsubscr...@googlegroups.com >> For more options, visit this group at >> http://groups.google.com/group/clojure?hl=en > > -- > You received this message because you are subscribed to the Google > Groups "Clojure" group. > To post to this group, send email to clojure@googlegroups.com > Note that posts from new members are moderated - please be patient with your > first post. > To unsubscribe from this group, send email to > clojure+unsubscr...@googlegroups.com > For more options, visit this group at > http://groups.google.com/group/clojure?hl=en -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en