Yet another approach that might work for you, depending on your requirements, is to use a lazy sequence to access your data. I did that for a load of Twitter data that would have been too large to hold in memory at any one time.
Here's the relevant bit (I think), copied and pasted: (defn out-files [dir-name] (let [dir (jio/file dir-name)] (map #(str (jio/file dir %)) (sort (filter #(.startsWith % "out") (.list dir)))))) (defn tweet-seq [dir-name] (map json/read-json (mapcat #(with-open [r (jio/reader %)] (doall (line-seq r))) (out-files dir-name)))) In context: https://gist.github.com/2357604 Ali On Wednesday, 11 April 2012 07:08:11 UTC+1, Andy Fingerhut wrote: > > On Apr 9, 2012, at 10:05 PM, Andy Wu wrote: > > > Hi there, > > > > I'm studying algo-class.org, and one of it's programming assignment > > gives you a file containing contents like below: > > 1 2 > > 1 7 > > 2 100 > > ... > > > > There is roughly over 5 million lines, and i want to first construct a > > vector of vector of integers for further process: > > [[1 2][1 7][2 100]...] > > > > Below is what the code looks like: > > > > (def int-vec (with-open [rdr (clojure.java.io/reader "<file name>")] > > (doall (map convert (line-seq rdr))))) > > > > and this leads to OutOfMemory Error. I tried to generate a vector with > > random intergers, and that wont introduce the error. So I guess it is > > the temp objects in convert(it break down a line in a list of strings, > > and then do the convert to integer) that are causing the memory issue. > > > > Can someone advice me what would be a ideal way to handle this case in > > clojure? > > > Most likely any way you want to do it will require more memory than the > default heap size that your JVM has. You can increase the heap size using > the -Xmx<num_megabytes>m command, e.g. > > java -Xmx2048m -cp clojure.jar clojure.main > > Replace the other command line arguments with what you need if they don't > match what I wrote. That might be all you need, assuming you have enough > RAM for the job. > > If that isn't all you need, you can consider using different data > structures that require less memory. One possibility is mutable Java > arrays. Another more Clojure-y method is (vector-of :int 1 2), which > creates an immutable Clojure vector that can only hold Java primitive ints. > > Andy > > -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en