I've been experimenting with reducers using a small example that counts the words in Wikipedia pages by parsing the Wikipedia XML dump. The basic structure of the code is:
(frequencies (flatten (map get-words (get-pages)))) where get-pages returns a lazy sequence of pages from the XML dump and get-words takes a page and returns a sequence of the words on that page. The above code takes ~40s to count the words on the first 10000 pages. If I convert that code to use reducers, it runs in ~22s (yay!). If I convert it to use fold and therefore run in parallel, it runs in ~13s on my 4-core MacBook Pro. So it's faster (yay!) but nowhere near 4x faster (boo). The primary reason for this is that, in order to be able to use fold, I've had to write my own version of frequencies: (defn frequencies-parallel [words] (r/fold (partial merge-with +) (fn [counts x] (assoc counts x (inc (get counts x 0)))) words)) And, unlike the version in core, this doesn't use transients. If I replace the fold with reduce (i.e. make it run sequentially) it runs in ~43s. So, I *am* getting close to a 4x speedup from parallelising the code, but unfortunately I'm also seeing a 2x slowdown because I can't use transients. Can anyone think of any way that it would be possible to modify this code to use transients? Or any way to modify reducers to allow transients to be used? -- paul.butcher->msgCount++ Snetterton, Castle Combe, Cadwell Park... Who says I have a one track mind? http://www.paulbutcher.com/ LinkedIn: http://www.linkedin.com/in/paulbutcher MSN: p...@paulbutcher.com AIM: paulrabutcher Skype: paulrabutcher -- -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.