Thanks Sean, your first suggestion was a very good one :) Tweaking JVM settings feels like advanced magic, and I am a little surprised that it is necessary at such an early stage in my Clojure journey. But googling confirms that the default JVM settings are miserly to an extreme, and I need at least to insert an :jvm-opts ["- server"] in my project.clj. I would suggest to the author of Leiningen that perhaps this should be made the default?
I am getting a lot further now, but still running into OutOfMemory errors sometimes. And it is still the case that once I have suffered an OutOfMemoryError, they keep coming. It does feel as if there must be some large memory leak in the emacs/lein swank repl. Is this a recognised issue? The (print "f") is indeed there only for debugging purposes. I don't think it affects the laziness? And unfortunately I am not quite sure how to act on your other suggestions regarding processing workflow, since at the moment this is more of an exploratory project. I shall read the other suggestions regarding laziness later, and hopefully get somewhere with those. Thanks all! Alistair. On Jul 26, 3:18 pm, Sean Devlin <francoisdev...@gmail.com> wrote: > My first thought is that you need to tweak your JVM settings. Try > allocation a minimum of 512MB to the total. > > My second thought is that you need to use laziness to your advantage. > Remove the print expression from the mapping operation. It's useful > for debugging/prototyping, but shouldn't be in the final version. > Spit the processed json-seq into a file when you're done instead. > This way you can process one input file at a time, and simply append > your results to the output file. > > My $.02 > Sean > > On Jul 26, 9:53 am, atucker <agjf.tuc...@googlemail.com> wrote: > > > Hi all! I have been trying to use Clojure on a student project, but > > it's becoming a bit of a nightmare. I wonder whether anyone can > > help? I'm not studying computer science, and I really need to be > > getting on with the work I'm actually supposed to be doing :) > > > I am trying to work from a lot of Twitter statuses that I saved to > > text file. (Unfortunately I failed to escape quotes and such, so the > > JSON is not valid. Anyone know a good way of coping with that?) > > > Here is my function: > > > (defn json-seq [] > > (apply concat > > (map #(do (print "f") (str/split (slurp %) #"\nStatusJSONImpl")) > > out-files))) > > > Now there are forty files and five thousand statuses per file, which > > sounds like a lot, and I don't suppose I can hope to hold them all in > > memory at the same time. But I had thought that my function might > > produce a lazy sequence that would be more manageable. However I > > typically get: > > > twitter.core> (nth (json-seq dir-name) 5) > > ffff"{createdAt=Fri .... etc. GOOD > > > twitter.core> (nth (json-seq dir-name) 5000) > > ffff > > Java heap space > > [Thrown class java.lang.OutOfMemoryError] BAD > > > And at this point my REPL is done for. Any further instruction will > > result in another OutOfMemoryError. (Surely that has to be a bug just > > there? Has the garbage collector just given up?) > > > Anyway I am thinking that the sequence is not behaving as lazily as I > > need it to. It's not reading one file at a time, and it's not reading > > thirty-two as I might expect from "chunks", but something in the > > middle. I did try the "dechunkifying" code from page 339 of "Joy of > > Clojure", but that doesn't compile at all :( > > > I do seem to keep running into memory problems with Clojure. I have > > 2GB RAM and am using Snow Leopard, Aquamacs 2.0, Clojure 1.2.0 beta1 > > and Leiningen 1.2.0. > > > Cheers > > Alistair -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en