So in case anyone else stumbles across this topic, I thought I'd share
what little I have learned about the laziness of concat, and by
extension mapcat, as used in this function.

(defn json-seq [dir-name]
  (mapcat #(do (print "f") (str/split (slurp %) #"\nStatusJSONImpl"))
          (out-files dir-name)))

It seems that because of the particular way concat is written, it
keeps looking ahead by two or three items.  However this doesn't
appear to be a necessary aspect of its behaviour.  So the following
version of json-seq, incorporating what is essentially a rewritten
concat, doesn't suffer from the same problem.

(defn json-seq [dir-name]
  (letfn [(cat [xs fs]
               (lazy-seq
                (if-let [xs (seq xs)]
                  (cons (first xs) (cat (rest xs) fs))
                  (if-let [fs (seq fs)]
                    (cat (do
                           (print "f")
                           (str/split (slurp (first fs)) #"\nStatusJSONImpl"))
                         (rest fs))))))]
    (cat '() (out-files dir-name))))

Alistair


On Jul 26, 2:53 pm, atucker <agjf.tuc...@googlemail.com> wrote:
> Hi all!  I have been trying to use Clojure on a student project, but
> it's becoming a bit of a nightmare.  I wonder whether anyone can
> help?  I'm not studying computer science, and I really need to be
> getting on with the work I'm actually supposed to be doing :)
>
> I am trying to work from a lot of Twitter statuses that I saved to
> text file.  (Unfortunately I failed to escape quotes and such, so the
> JSON is not valid.  Anyone know a good way of coping with that?)
>
> Here is my function:
>
> (defn json-seq []
>   (apply concat
>          (map #(do (print "f") (str/split (slurp %) #"\nStatusJSONImpl"))
>               out-files)))
>
> Now there are forty files and five thousand statuses per file, which
> sounds like a lot, and I don't suppose I can hope to hold them all in
> memory at the same time.  But I had thought that my function might
> produce a lazy sequence that would be more manageable.  However I
> typically get:
>
> twitter.core> (nth (json-seq dir-name) 5)
> ffff"{createdAt=Fri .... etc.   GOOD
>
> twitter.core> (nth (json-seq dir-name) 5000)
> ffff
> Java heap space
>   [Thrown class java.lang.OutOfMemoryError]   BAD
>
> And at this point my REPL is done for.  Any further instruction will
> result in anotherOutOfMemoryError.  (Surely that has to be a bug just
> there?  Has the garbage collector just given up?)
>
> Anyway I am thinking that the sequence is not behaving as lazily as I
> need it to.  It's not reading one file at a time, and it's not reading
> thirty-two as I might expect from "chunks", but something in the
> middle.  I did try the "dechunkifying" code from page 339 of "Joy of
> Clojure", but that doesn't compile at all :(
>
> I do seem to keep running into memory problems with Clojure.  I have
> 2GB RAM and am using Snow Leopard, Aquamacs 2.0, Clojure 1.2.0 beta1
> and Leiningen 1.2.0.
>
> Cheers
> Alistair

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Reply via email to