I wrote a simple word counter described here http://ptrace.fefe.de/wp/
it reads stdin and counts the occurrences of words, however I notice
that it runs significantly slower than the java version in the link.

I was wondering why there is such a dramatic difference. The approach
I took was to create a map keyed on words and use the occurrence count
as the value. When each line is read from input it's tokenized and the
word counts are updated. The slowdown seems to occur in the inc-count
function, where it "updates" the map using the assoc. Is this not a
proper way to approach this in clojure?

I've also noticed that there is a significant speed difference between
conj and assoc, why is that?
If I understand correctly both should only create the delta of the new
elements and the old structure, however  assoc appears to perform much
better.

(import '(java.io BufferedReader InputStreamReader))

(defn inc-count [words word]
  (if (= (. word (length)) 0)
    words
        (let [cnt (get words word)]
                (if cnt (assoc words word (inc cnt))
                (assoc words word 1)))))

(defn sort-words [words]
  (reverse (sort-by (fn [x] (first x))
             (map (fn [x] [(get words x) x])
                     (keys words)))))

(defn print-words [words]
    (let [head (first words) tail (rest words)]
      (if head
        (do
                (println head)
                (recur tail)))))

(defn read-words [words line]
  (let [head (first line) tail (rest line)]
          (if (nil? tail) words
            (recur (time (inc-count words head)) tail))))

(defn read-input []
        (with-open [stream (System/in)]
        (let [buf (BufferedReader. (InputStreamReader. stream))]
          (loop [line (. buf (readLine)) words {}]
            (if (nil? line)
              (print-words (sort-words words))
                  (recur (. buf (readLine)) (read-words words (. line (split "
")))))))))

(time (read-input))
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To post to this group, send email to clojure@googlegroups.com
To unsubscribe from this group, send email to 
clojure+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/clojure?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to