On 29 Sep 2013, at 22:58, Paul Mooser <taron...@gmail.com> wrote:

> Paul, is there any easy way to get the (small) dataset you're working with, 
> so we can run your actual code against the same data?

The dataset I'm using is a Wikipedia dump, which hardly counts as "small" :-)

Having said that, the first couple of million lines is all you need to 
reproduce the results I'm getting, which you can download with:

curl 
http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2 | 
bunzip2 | head -n 2000000 > enwiki-short.xml

--
paul.butcher->msgCount++

Snetterton, Castle Combe, Cadwell Park...
Who says I have a one track mind?

http://www.paulbutcher.com/
LinkedIn: http://www.linkedin.com/in/paulbutcher
MSN: p...@paulbutcher.com
AIM: paulrabutcher
Skype: paulrabutcher

-- 
-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to