Yet another approach that might work for you, depending on your 
requirements, is to use a lazy sequence to access your data.  I did that 
for a load of Twitter data that would have been too large to hold in memory 
at any one time.

Here's the relevant bit (I think), copied and pasted:

(defn out-files [dir-name]
  (let [dir (jio/file dir-name)]
    (map #(str (jio/file dir %))
     (sort (filter #(.startsWith % "out") (.list dir))))))

(defn tweet-seq [dir-name]
  (map json/read-json
       (mapcat #(with-open [r (jio/reader %)] (doall (line-seq r)))
           (out-files dir-name))))

In context: https://gist.github.com/2357604

Ali

On Wednesday, 11 April 2012 07:08:11 UTC+1, Andy Fingerhut wrote:
>
> On Apr 9, 2012, at 10:05 PM, Andy Wu wrote:
>
> > Hi there,
> > 
> > I'm studying algo-class.org, and one of it's programming assignment
> > gives you a file containing contents like below:
> > 1 2
> > 1 7
> > 2 100
> > ...
> > 
> > There is roughly over 5 million lines, and i want to first construct a
> > vector of vector of integers for further process:
> > [[1 2][1 7][2 100]...]
> > 
> > Below is what the code looks like:
> > 
> > (def int-vec (with-open [rdr (clojure.java.io/reader "<file name>")]
> >                            (doall (map convert (line-seq rdr)))))
> > 
> > and this leads to OutOfMemory Error. I tried to generate a vector with
> > random intergers, and that wont introduce the error. So I guess it is
> > the temp objects in convert(it break down a line in a list of strings,
> > and then do the convert to integer) that are causing the memory issue.
> > 
> > Can someone advice me what would be a ideal way to handle this case in
> > clojure?
>
>
> Most likely any way you want to do it will require more memory than the 
> default heap size that your JVM has.  You can increase the heap size using 
> the -Xmx<num_megabytes>m command, e.g.
>
> java -Xmx2048m -cp clojure.jar clojure.main
>
> Replace the other command line arguments with what you need if they don't 
> match what I wrote.  That might be all you need, assuming you have enough 
> RAM for the job.
>
> If that isn't all you need, you can consider using different data 
> structures that require less memory.  One possibility is mutable Java 
> arrays.  Another more Clojure-y method is (vector-of :int 1 2), which 
> creates an immutable Clojure vector that can only hold Java primitive ints.
>
> Andy
>
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Reply via email to