Yet another approach that might work for you, depending on your 
requirements, is to use a lazy sequence to access your data.  I did that 
for a load of Twitter data that would have been too large to hold in memory 
at any one time.

Here's the relevant bit (I think), copied and pasted:

(defn out-files [dir-name]
  (let [dir (jio/file dir-name)]
    (map #(str (jio/file dir %))
     (sort (filter #(.startsWith % "out") (.list dir))))))

(defn tweet-seq [dir-name]
  (map json/read-json
       (mapcat #(with-open [r (jio/reader %)] (doall (line-seq r)))
           (out-files dir-name))))

In context:


On Wednesday, 11 April 2012 07:08:11 UTC+1, Andy Fingerhut wrote:
> On Apr 9, 2012, at 10:05 PM, Andy Wu wrote:
> > Hi there,
> > 
> > I'm studying, and one of it's programming assignment
> > gives you a file containing contents like below:
> > 1 2
> > 1 7
> > 2 100
> > ...
> > 
> > There is roughly over 5 million lines, and i want to first construct a
> > vector of vector of integers for further process:
> > [[1 2][1 7][2 100]...]
> > 
> > Below is what the code looks like:
> > 
> > (def int-vec (with-open [rdr ( "<file name>")]
> >                            (doall (map convert (line-seq rdr)))))
> > 
> > and this leads to OutOfMemory Error. I tried to generate a vector with
> > random intergers, and that wont introduce the error. So I guess it is
> > the temp objects in convert(it break down a line in a list of strings,
> > and then do the convert to integer) that are causing the memory issue.
> > 
> > Can someone advice me what would be a ideal way to handle this case in
> > clojure?
> Most likely any way you want to do it will require more memory than the 
> default heap size that your JVM has.  You can increase the heap size using 
> the -Xmx<num_megabytes>m command, e.g.
> java -Xmx2048m -cp clojure.jar clojure.main
> Replace the other command line arguments with what you need if they don't 
> match what I wrote.  That might be all you need, assuming you have enough 
> RAM for the job.
> If that isn't all you need, you can consider using different data 
> structures that require less memory.  One possibility is mutable Java 
> arrays.  Another more Clojure-y method is (vector-of :int 1 2), which 
> creates an immutable Clojure vector that can only hold Java primitive ints.
> Andy

You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
For more options, visit this group at

Reply via email to