To get my feet wet with Clojure, I'm taking a shot at some of the
"Computer Language Benchmarks Game" benchmarks, starting with a simple
one called k-nucleotide, described at the bottom of this web page:

http://shootout.alioth.debian.org/u32q/benchmark.php?test=knucleotide&lang=all&sort=elapsed

The "full benchmark" is given an input file that is about 242 MB long,
but the program should only be keeping a string that is 125,000,000
characters long from all that it reads, and discarding the rest (I
hope).  According to the web page below, that should require a little
more than 2 bytes per character to store in RAM, so about 250 MB.

http://www.javamex.com/tutorials/memory/string_memory_usage.shtml

Then the code creates a map to count the number of times each length k
substring occurs inside of that big string, for k=1, 2, 3, 4, 6, 12,
and 18.  Those should all be fairly small by comparison to the string.

However, when I allocate 1 GB of space to the heap, the program
eventually crashes due to being out of memory.  It works with 1.25 GB
of heap space, but it seems like there ought to be a way to implement
this in Clojure with only about 0.5 GB.  I don't know how to do that
yet.

Does anyone see anything I'm doing in the code that can hold onto data
that is no longer needed?

Or have any suggestions on type declarations or other techniques to
speed it up?  It is currently taking about 43 minutes of CPU time on
my MacBook Pro, vs. 5 mins for the Perl version, and 5.5 minutes for
the SBCL Common Lisp version that were submitted for the benchmark.

You can get the Clojure source code knucleotide.clj here:

http://homepage.mac.com/jafingerhut/files/knucleotide.clj

The link below will get that, plus some scripts and sample input and
output files I've been testing with, from the benchmark site.  Note
that it is 72 MB, primarily due to the 242 MB input file included.

http://homepage.mac.com/jafingerhut/files/knuc.zip

Note: The intent of the k-nucleotide benchmark programs is that the
code should calculate the contents of each of the mentioned hash
tables completely, even though in most cases it is only printing out a
single entry from them.  No fair calculating only the values that need
to be printed.

Also, the goal is to have programs in different languages that
calculate comparable things using the same algorithms -- it isn't to
find better algorithms to calculate the same thing.


I also wanted to ask about one particular part of the code that I have
already optimized for memory usage, but it confuses me why it helps.
I have Stuart Halloway's Programming Clojure book (recommended), and
on pp. 139-140 he has a short section entitled "Losing Your Head"
mentioning that for sequence-processing code, writing functions that
do *not* "hold on to a reference to the head of the sequence"
(i.e. they "lose their head") can use significantly less memory.  This
is true in situations where nothing else in the program holds onto
that same head, and the sequence is long.

My original version of the "tally" function takes a sequence 'things'
of objects, and returns a map whose keys are the set of unique objects
in 'things', and where the value of (h obj) is the number of times obj
occurs in 'things':

(defn tally-keeps-head
  [things]
  (loop [h {}
         remaining things]
    (if-let [r (seq remaining)]
      (let [key (first r)]
        (recur (assoc h key (inc (get h key 0))) (rest r)))
      h)))

That one apparently keeps a reference to beginning of 'things', even
though things is only used once and never mentioned again.  The pair
of functions below, however, loses its head, even though it is very
similar.  Can someone explain to me why?  I mean, I think I can see
why tally-loses-head-helper loses its head, but why does
tally-loses-head, which like tally-keeps-head uses 'things' once and
then never mentions it again?

(defn tally-loses-head-helper
  [h things]
  (if-let [r (seq things)]
    (let [key (first r)]
      (recur (assoc h key (inc (get h key 0))) (rest r)))
    h))

(defn tally-loses-head
  [things]
  (tally-loses-head-helper {} things))


Thanks,
Andy Fingerhut

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to [email protected]
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to