On Mon, Jul 11, 2011 at 8:15 AM, Dhruv Kumar <[email protected]> wrote:

> On Fri, Jul 8, 2011 at 11:05 AM, Jake Mannix <[email protected]>
> wrote:
>
> > At the end of the exception trace, you should see the list of options
> which
> > it will
> > take.  As I said, it's missing a "--help" option, but all of the mahout
> > programs,
> > if given an incorrect argument, will give this stack trace, followed by
> the
> > list of arguments you *could* use.
> >
>
> Seems to violate the principle of least astonishment.
>

Of course it does, which is why I said that it was a bug in that particular
script.


> If this is a systemic issue with all the command line scripts, I think we
> should create a JIRA issue for it. I can work on it on the side with my
> GSOC
> project.
>

It is specific to seqdumper and vectordumper.  All other actions in the
script do the right thing, that I know of.


> Why does this happen in the first place?
>

I think it's a really simple, easy-to-fix issue: VectorDumper.java has a
line
in main():

Group group =
gbuilder.withName("Options").withOption(seqOpt).withOption(outputOpt)

.withOption(dictTypeOpt).withOption(dictOpt).withOption(csvOpt).withOption(vectorAsKeyOpt)
  .withOption(printKeyOpt).withOption(sizeOpt).create();

but it does not have a "withOption(helpOpt)" which was defined above, and so
it never checks for
this option when parsing.  Adding this line should make --help do the right
thing.

  -jake


>
> >
> > In this case, they're printed below, I'll cut the part out you need:
> >
> > ---------------
> > Usage:
> >
> >  [--seqFile <seqFile> --output <output> --dictionaryType <dictionaryType>
> > --dictionary <dictionary> --csv --useKey --printKey --sizeOnly]
> >
> > Options
> >
> >  --seqFile (-s) seqFile                   The Sequence File
> > containing the Vectors
> >  --output (-o) output                      The output file.  If
> > not specified,
> >                                                  dumps to the console
> >  --dictionaryType (-dt) dictionaryType    The dictionary
> > file type (text|sequencefile)
> >  --dictionary (-d) dictionary             The dictionary file.
> >  --csv (-c)                               Output the Vector as
> > CSV. Otherwise
> >                                          it substitutes in the terms for
> >                                          vector cell entries
> >  --useKey (-u)                            If the Key is a vector, then
> dump
> >                                          that instead
> >  --printKey (-p)                          Print out the key as
> > well, delimited
> >                                          by a tab (or the value if
> > useKey is true)
> >  --sizeOnly (-sz)                         Dump only the size of the
> vector
> >
> > ----------------
> >
> > This means you want to do:
> >
> > ./bin/mahout -s path_to_docTopics_output -o
> > path_you_want_to_write_text_output_to
> >
> > and then just look in path_you_want_to_write_text_output_to, and it
> should
> > have
> > what you want.
> >
> >  -jake
> >
> > On Fri, Jul 8, 2011 at 6:16 AM, huaiyang gongzi <
> [email protected]
> > >wrote:
> >
> > > Thanks, Jake. But after typing  mahout  vectordump --help,  I got sth
> > like
> > > this
> > >
> > > 11/07/08 09:14:25 ERROR vectors.VectorDumper: Exception
> > > org.apache.commons.cli2.OptionException: Unexpected --help while
> > processing
> > > Options
> > >        at
> > org.apache.commons.cli2.commandline.Parser.parse(Parser.java:99)
> > >        at
> > >
> org.apache.mahout.utils.vectors.VectorDumper.main(VectorDumper.java:100)
> > >        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > >        at
> > >
> > >
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> > >        at
> > >
> > >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > >        at java.lang.reflect.Method.invoke(Method.java:597)
> > >        at
> > >
> > >
> >
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> > >        at
> > > org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> > >        at
> > org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
> > >        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > >        at
> > >
> > >
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> > >        at
> > >
> > >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > >        at java.lang.reflect.Method.invoke(Method.java:597)
> > >        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> > > Usage:
> > >
> > >  [--seqFile <seqFile> --output <output> --dictionaryType
> > > <dictionaryType>
> > > --dictionary <dictionary> --csv --useKey --printKey
> > > --sizeOnly]
> > > Options
> > >
> > >  --seqFile (-s) seqFile                   The Sequence File containing
> > > the
> > >
> > > Vectors
> > >  --output (-o) output                     The output file.  If not
> > > specified,
> > >                                           dumps to the
> > > console
> > >  --dictionaryType (-dt) dictionaryType    The dictionary file
> > > type
> > >
> > > (text|sequencefile)
> > >  --dictionary (-d) dictionary             The dictionary
> > > file.
> > >  --csv (-c)                               Output the Vector as CSV.
> > > Otherwise
> > >                                           it substitutes in the terms
> > > for
> > >                                           vector cell
> > > entries
> > >  --useKey (-u)                            If the Key is a vector, then
> > > dump
> > >                                           that
> > > instead
> > >  --printKey (-p)                          Print out the key as well,
> > > delimited
> > >                                           by a tab (or the value if
> > useKey
> > > is
> > >
> > > true)
> > >  --sizeOnly (-sz)                         Dump only the size of the
> > > vector
> > > 11/07/08 09:14:25 INFO driver.MahoutDriver: Program took 30 ms
> > >
> > >
> > > On Thu, Jul 7, 2011 at 5:56 PM, Jake Mannix <[email protected]>
> > wrote:
> > >
> > > > On Thu, Jul 7, 2011 at 5:53 PM, wine lover <[email protected]>
> > wrote:
> > > >
> > > > > Dear All,
> > > > >
> > > > > After running LDA analysis, I got the docTopic file, which is a
> > regular
> > > > > sequence-file. How to transfer it into a readable format? I
> searched
> > > > > vectordumper, or vectordump, but did not get any useful results,
> such
> > > as
> > > > > how
> > > > > to use it in command-line? Thanks.
> > > > >
> > > >
> > > > So you say you "searched vectordumper/vectordump", you mean you
> > > > looked through the code looking for it, or you used it and it didn't
> do
> > > > what
> > > > you wanted?
> > > >
> > > > If you're just not sure how to use it, try running "./bin/mahout"
> from
> > > your
> > > > distribution directory, with no arguments, and it will print out a
> > bunch
> > > of
> > > > possible commands, one of which is vectordump.   If you try to run it
> > > > with no arguments, it will sadly exit silently, not telling you what
> > the
> > > > usage is (this is a bug!), but if you try to give it an illegal
> > argument,
> > > > like
> > > >
> > > > ./bin/mahout vectordump --help
> > > >
> > > > You'll see:
> > > > Usage:
> > > >
> > > >  [--seqFile <seqFile> --output <output> --dictionaryType
> > <dictionaryType>
> > > >
> > > > --dictionary <dictionary> --csv --useKey --printKey --sizeOnly]
> > > >
> > > > Options
> > > >
> > > >  --seqFile (-s) seqFile                   The Sequence File
> containing
> > > the
> > > >
> > > >                                           Vectors
> > > >
> > > >  --output (-o) output                     The output file.  If not
> > > > specified,
> > > >                                           dumps to the console
> > > >
> > > >  --dictionaryType (-dt) dictionaryType    The dictionary file type
> > > >
> > > >                                           (text|sequencefile)
> > > >
> > > >  --dictionary (-d) dictionary             The dictionary file.
> > > >
> > > >  --csv (-c)                               Output the Vector as CSV.
> > > >  Otherwise
> > > >                                           it substitutes in the terms
> > for
> > > >
> > > >                                           vector cell entries
> > > >
> > > >  --useKey (-u)                            If the Key is a vector,
> then
> > > dump
> > > >
> > > >                                           that instead
> > > >
> > > >  --printKey (-p)                          Print out the key as well,
> > > > delimited
> > > >                                           by a tab (or the value if
> > > useKey
> > > > is
> > > >                                           true)
> > > >
> > > >  --sizeOnly (-sz)                         Dump only the size of the
> > > vector
> > > >
> > > >
> > > > -----
> > > >
> > > > If you use these instructions to point to the docTopics output
> > location,
> > > > you can have it print out the p(topic | document) for each
> > topic/document
> > > > pair in your collection.
> > > >
> > > >  -jake
> > > >
> > >
> >
>

Reply via email to