If you're interested only in counting the number of unique words, then
you don't even need a map. You can get by with a set, like this:

(defn unique-words-in-file
  [file]
    (count (set (split-on-whitespace (slurp file)))))

slurp reads file into a String object in memory. The hypothetical
split-on-whitespace takes a String and returns a collection of word
objects. set takes a collection and produces a set of the elements in
that collection. count counts the number of elements in the set.

If, on the other hand, you wanted a map from each word in the file to
the number of times that it appears, you might do it like this:

(defn word-counts
  [file]
    (reduce
      (fn [map word] (assoc map word (inc (get map word 0))))
      {}
      (split-on-whitespace (slurp file))))

The reduce starts with the empty map {}, and then for each word in the
file, produces a new map by invoking the anonymous function supplied
as the first argument to reduce.

You could also get the same result with a list comprehension, using
for:

(defn word-counts
  [file]
    (apply merge-with +
      (for [word (split-on-whitespace (slurp file))]
        {word 1})))

Here we emit a map for each word in the file, mapping that word to 1.
Then we merge all the maps together, using + when two maps contain the
same key.  This function has a small bug: it throws an exception if
the file contains no words. To fix it, you would insert an additional
argument to apply, the empty map {}.

On Oct 19, 9:16 pm, "Tom Emerson" <[EMAIL PROTECTED]> wrote:
> Hi all,
>
> I have a somewhat embarassing newbie question on the use of hash maps
> in a functional environment.
>
> Consider a little utility that counts the number of unique words in a
> file. A hash map mapping strings to integers is the obvious data
> structure for this, but the fact that (assoc) returns a new map each
> time it is called is tripping me up: since I can't define a 'global'
> hash map to accumulate the counts, do you pass one around in a
> function? or do you structure the code a different way? This is a
> difference from what I would do in Common Lisp, where I would just
> have a global that is used for the collection.
>
> Thanks in advance for your wisdom.
>
>     -tree
>
> --
> Tom Emerson
> [EMAIL PROTECTED]://www.dreamersrealm.net/~tree
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To post to this group, send email to clojure@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/clojure?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to