[ 
https://issues.apache.org/jira/browse/MAHOUT-502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Prasanna Kumar updated MAHOUT-502:
--------------------------------------

    Attachment: MAHOUT-502.patch

patch modifies org.apache.mahout.common.CommandLineUtil to add footer note

> Adding footer note to command line utility
> ------------------------------------------
>
>                 Key: MAHOUT-502
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-502
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Utils
>            Reporter: Joe Prasanna Kumar
>            Priority: Trivial
>         Attachments: MAHOUT-502.patch
>
>
> Hi all,
> Since ClusterDumper doesnt seem to have elaborate documentation, just created 
> a page https://cwiki.apache.org/confluence/display/MAHOUT/Cluster+Dumper
> While playing around with clusterdump utility, I learned that it can be run 
> on hadoop or as a standalone java program.
> As most of you are aware, when executed on hadoop, the seqFileDir and 
> pointsDir should be the HDFS location else the local system path location. 
> Since some of the clustering related wiki pages specified that we can get the 
> output from HDFS and then run clusterdump, I was assuming that the 
> clusterdump would always read data from local FS.
> I am not sure if newbies would have this same thought process.. So I was 
> thinking if we'd need to make this explicit by changing the help list of 
> clusterdump
> Currently ClusterDumper.java has 
>  addOption(SEQ_FILE_DIR_OPTION, "s", "The directory containing Sequence Files 
> for the Clusters", true);
> Should we specify something like
>  addOption(SEQ_FILE_DIR_OPTION, "s", "The directory (HDFS if using Hadoop / 
> Local filesystem if on standalone mode) containing Sequence Files for the 
> Clusters", true);
> and so on..
> The problem with this approach is itz repetitive in that we'd need to change 
> in quite a few places.. (I believe vectordump also follows the same principle)
> or 
> should we modify CommandLineUtil to have a generic message in the help 
> specifying the fact that while running hadoop, the directories should 
> reference HDFS location else local FS.
> How about adding it to the footer like 
> formatter.setFooter("Specify HDFS directories while running hadoop; else 
> specify local File System directories");
> formatter.printFooter();
> Appreciate your feedbacks / thots.
> thanks
> Joe.
> from  Jeff Eastman <[email protected]>
> reply-to      [email protected]
> to    [email protected]
> date  Fri, Sep 3, 2010 at 2:45 PM
> subject       Re: ClusterDumper - Hadoop or standalone ?
> mailed-by     mahout.apache.org
> hide details Sep 3 (12 days ago)
> - Show quoted text -
> +1 to generic message approach

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to