I put up a parser for the IRC history logs here https://github.com/andrewmusselman/util/blob/master/irc-parser.sh
I'd like to write one for the user list too to figure out the most common problems/questions so we can focus effort on repairs to bugs and docs. But the mail archives at https://mail-archives.apache.org/mod_mbox/mahout-user/ are dynamic, loaded in through JavaScript, so parsing them isn't that straightforward. Is it possible to get the mbox files directly?
