On Fri, Aug 17, 2012 at 10:53 PM, David Jacobs <da...@wit.io> wrote:
> Okay that's great. Thanks, you guys. Was read-lines only holding onto
> the head of the line seq because I bound it in the let statement?

Yea... I think so... I don't know if that's a case that the compiler's
"locals clearing" handles. In any event, that's why I chose to pass
the lazy sequence directly to the called function without binding it
in a let first.

// Ben

> On Fri, Aug 17, 2012 at 11:09 AM, Ben Smith-Mannschott
> <bsmith.o...@gmail.com> wrote:
>> On Thu, Aug 16, 2012 at 11:47 PM, David Jacobs <da...@wit.io> wrote:
>>> I'm trying to grab 5 lines by their line numbers from a large (> 1GB) file
>>> with Clojure.
>>>
>>> So far I've got:
>>>
>>> (defn multi-nth [values indices]
>>>   (map (partial nth values) indices))
>>>
>>> (defn read-lines [file indices]
>>>   (with-open [rdr (clojure.java.io/reader file)]
>>>     (let [lines (line-seq rdr)]
>>>       (multi-nth lines indices))))
>>>
>>> Now, (read-lines "my-file" [0]) works without a problem. However, passing in
>>> [0 1] gives me the following error: "java.lang.RuntimeException:
>>> java.io.IOException: Stream closed"
>>>
>>> It seems that the stream is being closed before I can read the second line
>>> from the file. Interestingly, if I manually pull out a line from the file
>>> with something like `(nth lines 200)`, the `multi-nth` call works for all
>>> values <= 200.
>>>
>>> Any idea what's going on?
>>>
>>> PS This question is on SO if someone wants points:
>>> http://stackoverflow.com/questions/11995807/lazily-extract-lines-from-large-file
>>
>> The lazyness of map is biting you. The result of read-lines will not
>> have been fully realized before the file is closed.  Also, calling nth
>> repeatedly is not going to do wonders for efficiency. Try this on for
>> size:
>>
>>
>> (ns nthlines.core
>>   (:require [clojure.java.io :as io]))
>>
>> (defn multi-nth [values indices]
>>   (let [matches-index? (set indices)]
>>     (keep-indexed #(when (matches-index? %1) %2) values)))
>>
>> (defn read-lines [file indices]
>>   (with-open [r (io/reader file)]
>>     (doall (multi-nth (line-seq r) indices))))
>>
>> (comment
>>
>>   (def words "/Users/bsmith/w/nthlines/words.txt")
>>   (def nlines 84918960) ;; 856MB with one word per line
>>
>>   (time (read-lines words [0 1 2 (- nlines 2) (- nlines 1)]))
>>
>>   ;;=> "Elapsed time: 18778.904 msecs"
>>   ;;   ("A" "a" "aa" "Zyzomys" "Zyzzogeton")
>>
>> )
>>
>> // Ben
>>
>> --
>> You received this message because you are subscribed to the Google
>> Groups "Clojure" group.
>> To post to this group, send email to clojure@googlegroups.com
>> Note that posts from new members are moderated - please be patient with your 
>> first post.
>> To unsubscribe from this group, send email to
>> clojure+unsubscr...@googlegroups.com
>> For more options, visit this group at
>> http://groups.google.com/group/clojure?hl=en
>
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com
> Note that posts from new members are moderated - please be patient with your 
> first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Reply via email to