Drew / all,

I have written a script (80% done) for running the clustering job on
synthetic control data.
Should I upload this in MAHOUT-520 or should i open a new jira issue ?
I m thinking of modifying the build-reuters.sh to make it more interactive.
Currently it says "uncomment lines for kmeans or lda" but we can ask user
the select whether they want to run kmeans or lda and invoke the command for
those algos accordingly. I have done something similar for synthetic control
data example.

When we are running some of the examples, we are checking if HADOOP_HOME is
set. Sometimes HADOOP_HOME might be set but if hadoop is not running, then
our examples would fail. so I am trying to see what would be the best way to
check and make sure hadoop is up through shell script. Once I get this, the
script for synthetic control data should be complete.

I searched in google to see if there are any best practices / approaches reg
this but could really find anything solid.
appreciate your thoughts.

regards
Joe.

On Sat, Oct 9, 2010 at 1:10 PM, Gangadhar Nittala
<[email protected]>wrote:

> I think scripts which help users understand the usage of the various
> algorithms will be helpful. For the 0.5 release, if some of the
> algorithms have necessary scripts associated with them, it will make
> it easy for people interested in contributing to run the tests and
> look at the code. While testing the Bayes classifier that was one of
> the issues I faced.
>
> On Fri, Oct 8, 2010 at 8:40 PM, Ted Dunning <[email protected]> wrote:
> > I will build a few SGD based classifier scripts.
> >
> > On Fri, Oct 8, 2010 at 12:29 PM, Drew Farris <[email protected]> wrote:
> >
> >> Perhaps it would be easy for the individuals doing tests for 0.4 to at
> >> least take a transcript of the commands they're using so that they can
> >> eventually be changed into these sorts of scripts.
> >>
> >> On Fri, Oct 8, 2010 at 3:25 PM, Robin Anil <[email protected]>
> wrote:
> >> > +1 for integration script
> >> >
> >> > On Sat, Oct 9, 2010 at 12:52 AM, Drew Farris <[email protected]> wrote:
> >> >
> >> >> It sure would be really nice if we had more integration tests /
> >> >> example scripts for the various algorithms like build-reuters.sh
> >> >> script. These capture problems with the system in the way real users
> >> >> are likely to first encounter it, and provide an easy way for new
> >> >> users to understand the steps of using mahout externally to the wiki.
> >> >> If we were really smart, we'd run them automatically from hudson as a
> >> >> separate sanity check and then use something like gist to publish
> them
> >> >> to confluence automatically so our examples would always be up to
> >> >> date. But I get ahead of myself.
> >> >>
> >> >> Would something like the script attached to
> >> >> https://issues.apache.org/jira/browse/MAHOUT-520, which adds a
> script
> >> >> to run the bayes 20newsgroups example, be appropriate to commit at
> >> >> this point?
> >> >>
> >> >> Drew
> >> >>
> >> >
> >>
> >
>

Reply via email to