So in case anyone else stumbles across this topic, I thought I'd share what little I have learned about the laziness of concat, and by extension mapcat, as used in this function.
(defn json-seq [dir-name] (mapcat #(do (print "f") (str/split (slurp %) #"\nStatusJSONImpl")) (out-files dir-name))) It seems that because of the particular way concat is written, it keeps looking ahead by two or three items. However this doesn't appear to be a necessary aspect of its behaviour. So the following version of json-seq, incorporating what is essentially a rewritten concat, doesn't suffer from the same problem. (defn json-seq [dir-name] (letfn [(cat [xs fs] (lazy-seq (if-let [xs (seq xs)] (cons (first xs) (cat (rest xs) fs)) (if-let [fs (seq fs)] (cat (do (print "f") (str/split (slurp (first fs)) #"\nStatusJSONImpl")) (rest fs))))))] (cat '() (out-files dir-name)))) Alistair On Jul 26, 2:53 pm, atucker <agjf.tuc...@googlemail.com> wrote: > Hi all! I have been trying to use Clojure on a student project, but > it's becoming a bit of a nightmare. I wonder whether anyone can > help? I'm not studying computer science, and I really need to be > getting on with the work I'm actually supposed to be doing :) > > I am trying to work from a lot of Twitter statuses that I saved to > text file. (Unfortunately I failed to escape quotes and such, so the > JSON is not valid. Anyone know a good way of coping with that?) > > Here is my function: > > (defn json-seq [] > (apply concat > (map #(do (print "f") (str/split (slurp %) #"\nStatusJSONImpl")) > out-files))) > > Now there are forty files and five thousand statuses per file, which > sounds like a lot, and I don't suppose I can hope to hold them all in > memory at the same time. But I had thought that my function might > produce a lazy sequence that would be more manageable. However I > typically get: > > twitter.core> (nth (json-seq dir-name) 5) > ffff"{createdAt=Fri .... etc. GOOD > > twitter.core> (nth (json-seq dir-name) 5000) > ffff > Java heap space > [Thrown class java.lang.OutOfMemoryError] BAD > > And at this point my REPL is done for. Any further instruction will > result in anotherOutOfMemoryError. (Surely that has to be a bug just > there? Has the garbage collector just given up?) > > Anyway I am thinking that the sequence is not behaving as lazily as I > need it to. It's not reading one file at a time, and it's not reading > thirty-two as I might expect from "chunks", but something in the > middle. I did try the "dechunkifying" code from page 339 of "Joy of > Clojure", but that doesn't compile at all :( > > I do seem to keep running into memory problems with Clojure. I have > 2GB RAM and am using Snow Leopard, Aquamacs 2.0, Clojure 1.2.0 beta1 > and Leiningen 1.2.0. > > Cheers > Alistair -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en