It sure would be really nice if we had more integration tests / example scripts for the various algorithms like build-reuters.sh script. These capture problems with the system in the way real users are likely to first encounter it, and provide an easy way for new users to understand the steps of using mahout externally to the wiki. If we were really smart, we'd run them automatically from hudson as a separate sanity check and then use something like gist to publish them to confluence automatically so our examples would always be up to date. But I get ahead of myself.
Would something like the script attached to https://issues.apache.org/jira/browse/MAHOUT-520, which adds a script to run the bayes 20newsgroups example, be appropriate to commit at this point? Drew
