You can get a lazy sequence of all the lines in all the files by
something like:
(for [file out-files
line (with-open [r (io/reader file)] (line-seq r))]
line)
If "StatusJSONImpl" is on a separate line, you can throw in a :when
clause to filter them out:
(for [file out-files
line (with-open [r (io/reader file)] (line-seq r))
:when (not= line "StatusJSONImpl")]
line)
If it's a line prefix, you can remove it in the body:
(for [file out-files
line (with-open [r (io/reader file)] (line-seq r))]
(string/replace line "StatusJSONImpl" ""))
This is all assuming io is an alias for clojure.java.io, string for
clojure.string, and that getting your files line by line is useful.
Re OutOfMemoryException: if all the allocated heap memory is really
not freeable, then there's nothing the JVM can do -- it's being asked
to allocate memory for a new object, and there's none available.
On Jul 26, 9:53 am, atucker <[email protected]> wrote:
> Hi all! I have been trying to use Clojure on a student project, but
> it's becoming a bit of a nightmare. I wonder whether anyone can
> help? I'm not studying computer science, and I really need to be
> getting on with the work I'm actually supposed to be doing :)
>
> I am trying to work from a lot of Twitter statuses that I saved to
> text file. (Unfortunately I failed to escape quotes and such, so the
> JSON is not valid. Anyone know a good way of coping with that?)
>
> Here is my function:
>
> (defn json-seq []
> (apply concat
> (map #(do (print "f") (str/split (slurp %) #"\nStatusJSONImpl"))
> out-files)))
>
> Now there are forty files and five thousand statuses per file, which
> sounds like a lot, and I don't suppose I can hope to hold them all in
> memory at the same time. But I had thought that my function might
> produce a lazy sequence that would be more manageable. However I
> typically get:
>
> twitter.core> (nth (json-seq dir-name) 5)
> ffff"{createdAt=Fri .... etc. GOOD
>
> twitter.core> (nth (json-seq dir-name) 5000)
> ffff
> Java heap space
> [Thrown class java.lang.OutOfMemoryError] BAD
>
> And at this point my REPL is done for. Any further instruction will
> result in another OutOfMemoryError. (Surely that has to be a bug just
> there? Has the garbage collector just given up?)
>
> Anyway I am thinking that the sequence is not behaving as lazily as I
> need it to. It's not reading one file at a time, and it's not reading
> thirty-two as I might expect from "chunks", but something in the
> middle. I did try the "dechunkifying" code from page 339 of "Joy of
> Clojure", but that doesn't compile at all :(
>
> I do seem to keep running into memory problems with Clojure. I have
> 2GB RAM and am using Snow Leopard, Aquamacs 2.0, Clojure 1.2.0 beta1
> and Leiningen 1.2.0.
>
> Cheers
> Alistair
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to [email protected]
Note that posts from new members are moderated - please be patient with your
first post.
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en