To get my feet wet with Clojure, I'm taking a shot at some of the "Computer Language Benchmarks Game" benchmarks, starting with a simple one called k-nucleotide, described at the bottom of this web page:
http://shootout.alioth.debian.org/u32q/benchmark.php?test=knucleotide&lang=all&sort=elapsed The "full benchmark" is given an input file that is about 242 MB long, but the program should only be keeping a string that is 125,000,000 characters long from all that it reads, and discarding the rest (I hope). According to the web page below, that should require a little more than 2 bytes per character to store in RAM, so about 250 MB. http://www.javamex.com/tutorials/memory/string_memory_usage.shtml Then the code creates a map to count the number of times each length k substring occurs inside of that big string, for k=1, 2, 3, 4, 6, 12, and 18. Those should all be fairly small by comparison to the string. However, when I allocate 1 GB of space to the heap, the program eventually crashes due to being out of memory. It works with 1.25 GB of heap space, but it seems like there ought to be a way to implement this in Clojure with only about 0.5 GB. I don't know how to do that yet. Does anyone see anything I'm doing in the code that can hold onto data that is no longer needed? Or have any suggestions on type declarations or other techniques to speed it up? It is currently taking about 43 minutes of CPU time on my MacBook Pro, vs. 5 mins for the Perl version, and 5.5 minutes for the SBCL Common Lisp version that were submitted for the benchmark. You can get the Clojure source code knucleotide.clj here: http://homepage.mac.com/jafingerhut/files/knucleotide.clj The link below will get that, plus some scripts and sample input and output files I've been testing with, from the benchmark site. Note that it is 72 MB, primarily due to the 242 MB input file included. http://homepage.mac.com/jafingerhut/files/knuc.zip Note: The intent of the k-nucleotide benchmark programs is that the code should calculate the contents of each of the mentioned hash tables completely, even though in most cases it is only printing out a single entry from them. No fair calculating only the values that need to be printed. Also, the goal is to have programs in different languages that calculate comparable things using the same algorithms -- it isn't to find better algorithms to calculate the same thing. I also wanted to ask about one particular part of the code that I have already optimized for memory usage, but it confuses me why it helps. I have Stuart Halloway's Programming Clojure book (recommended), and on pp. 139-140 he has a short section entitled "Losing Your Head" mentioning that for sequence-processing code, writing functions that do *not* "hold on to a reference to the head of the sequence" (i.e. they "lose their head") can use significantly less memory. This is true in situations where nothing else in the program holds onto that same head, and the sequence is long. My original version of the "tally" function takes a sequence 'things' of objects, and returns a map whose keys are the set of unique objects in 'things', and where the value of (h obj) is the number of times obj occurs in 'things': (defn tally-keeps-head [things] (loop [h {} remaining things] (if-let [r (seq remaining)] (let [key (first r)] (recur (assoc h key (inc (get h key 0))) (rest r))) h))) That one apparently keeps a reference to beginning of 'things', even though things is only used once and never mentioned again. The pair of functions below, however, loses its head, even though it is very similar. Can someone explain to me why? I mean, I think I can see why tally-loses-head-helper loses its head, but why does tally-loses-head, which like tally-keeps-head uses 'things' once and then never mentions it again? (defn tally-loses-head-helper [h things] (if-let [r (seq things)] (let [key (first r)] (recur (assoc h key (inc (get h key 0))) (rest r))) h)) (defn tally-loses-head [things] (tally-loses-head-helper {} things)) Thanks, Andy Fingerhut --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to [email protected] Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/clojure?hl=en -~----------~----~----~----~------~----~------~--~---
