Hi,

On Feb 22, 11:30 am, Johann Kraus <johann.kr...@gmail.com> wrote:
> However, when loading with read-lines from
> clojure.contrib.duck-streams and (map #(Double/parseDouble %) (.split
> line ",")) clojure requires several GB of RAM.

In this case (as your numbers are delimited with ",") read-lines will
read just one giant line in to the ram (that would be your first extra
gigabyte). After that using .split on this string will create another
array of strings doubling memory consumation again.

> Any suggestions for how
> to get this down to 400MB? And what would be the overhead if reading
> into a clojure vector, which I really would prefer to using java
> arrays?

In master on github there is new function that supports creation of
Clojure vector of primitive types called vector-of. Here is a example
of using this new vector type:

(import (java.io FileWriter BufferedWriter
                 FileReader BufferedReader))

(defn write-data [n file-name]
  (let [make-double (fn []
                      (-> (rand) (* 1000) double str))]
    (with-open [w (-> file-name FileWriter. BufferedWriter.)]
      (binding [*out* w]
        (dotimes [i (dec n)]
          (printf "%s," (make-double))
          (when (= 0 (mod i 25000))
            (.flush *out*)))
        (print (make-double))))))

(defn parse-double [source]
  (Double/parseDouble (str source)))

(defn read-data [file-name]
  (with-open [r (-> file-name FileReader. BufferedReader.)]
    (loop [dv (vector-of :double) buff (StringBuilder.)]
      (let [i (.read r)]
        (if (= -1 i)
          (conj dv (parse-double buff))
          (let [c (char i)]
            (if (= \, c)
              (recur (conj dv (parse-double buff)) (StringBuilder.))
              (recur dv (.append buff c)))))))))

(time (write-data 60000000 "dump.txt"))
(def d (time (read-data "dump.txt")))

On my machine this version needs ~800mb to load 60000000 doubles.

--
Krešimir Šojat

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Reply via email to