[ 
https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837376#action_12837376
 ] 

Drew Farris commented on MAHOUT-301:
------------------------------------

{quote}
This wasn't a problem with my patch, right?  That was an issue of the mahout 
script in trunk itself?
{quote}

Yes it was a problem with the script in trunk. I believe this was due to the 
fact that the job files were on the classpath instead of all of the dependency 
jars. Adding the job files to the classpath does not add the dependency jars 
they contain to the classpath as well. So, no you didn't add this, but it 
should be fixed (and is in the patch)

{quote}
What is the -core option for?  I've never used it, how does it work?
{quote}

when you're running bin/mahout in the context of a build the -core option is 
used to tell it to use the build classpath instead of the classpath used for a 
binary release. This just follows the pattern established (by Doug?) in the 
hadoop and nutch launch scripts.

{quote}
Also added a help message for the 'run' argument.
{quote}

near line 72 in bin/mahout:
(this is different from the --help question I had)

{code}
  echo "  seq2sparse            generate sparse vectors from a sequence file"
  echo "  vectordump            dump vectors from a sequence file"
  echo "  run                   run mahout tasks using the MahoutDriver, see: 
http://cwiki.apache.org/MAHOUT/mahoutdriver.html";
{code}

{quote}
So you already added the ability to load via classpath, right? If we merge that 
way of thinking with what I'm currently working on (having a configurable 
"MAHOUT_CONF_DIR" which is used for all these props files), we could just have 
the mahout shell script just add MAHOUT_CONF_DIR to the classpath (the way you 
already have it adding the hardwired core/src/main/resources directory) and 
then it would work that way.
{quote}

Yep, that should do it, as long as MAHOUT_CONF_DIR appears before 
src/main/resources, we should be good to go. It should be added outside of the 
section of the script that determines if -core has been specified on the 
command-line.



> Improve command-line shell script by allowing default properties files
> ----------------------------------------------------------------------
>
>                 Key: MAHOUT-301
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-301
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Utils
>    Affects Versions: 0.3
>            Reporter: Jake Mannix
>            Assignee: Jake Mannix
>            Priority: Minor
>             Fix For: 0.4
>
>         Attachments: MAHOUT-301-drew.patch, MAHOUT-301.patch, 
> MAHOUT-301.patch, MAHOUT-301.patch
>
>
> Snippet from javadoc gives the idea:
> {code}
> /**
>  * General-purpose driver class for Mahout programs.  Utilizes 
> org.apache.hadoop.util.ProgramDriver to run
>  * main methods of other classes, but first loads up default properties from 
> a properties file.
>  *
>  * Usage: run on Hadoop like so:
>  *
>  * $HADOOP_HOME/bin/hadoop -jar path/to/job 
> org.apache.mahout.driver.MahoutDriver [classes.props file] shortJobName \
>  *   [default.props file for this class] [over-ride options, all specified in 
> long form: --input, --jarFile, etc]
>  *
>  * TODO: set the Main-Class to just be MahoutDriver, so that this option 
> isn't needed?
>  *
>  * (note: using the current shell scipt, this could be modified to be just 
>  * $MAHOUT_HOME/bin/mahout [classes.props file] shortJobName [default.props 
> file] [over-ride options]
>  * )
>  *
>  * Works like this: by default, the file 
> "core/src/main/resources/driver.classes.prop" is loaded, which
>  * defines a mapping between short names like "VectorDumper" and fully 
> qualified class names.  This file may
>  * instead be overridden on the command line by having the first argument be 
> some string of the form *classes.props.
>  *
>  * The next argument to the Driver is supposed to be the short name of the 
> class to be run (as defined in the
>  * driver.classes.props file).  After this, if the next argument ends in 
> ".props" / ".properties", it is taken to
>  * be the file to use as the default properties file for this execution, and 
> key-value pairs are built up from that:
>  * if the file contains
>  *
>  * input=/path/to/my/input
>  * output=/path/to/my/output
>  *
>  * Then the class which will be run will have it's main called with
>  *
>  *   main(new String[] { "--input", "/path/to/my/input", "--output", 
> "/path/to/my/output" });
>  *
>  * After all the "default" properties are loaded from the file, any further 
> command-line arguments are taken in,
>  * and over-ride the defaults.
>  */
> {code}
> Could be cleaned up, as it's kinda ugly with the whole "file named in 
> .props", but gives the idea.  Really helps cut down on repetitive long 
> command lines, lets defaults be put props files instead of locked into the 
> code also.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to