On Mon, Jul 11, 2011 at 8:15 AM, Dhruv Kumar <[email protected]> wrote:
> On Fri, Jul 8, 2011 at 11:05 AM, Jake Mannix <[email protected]> > wrote: > > > At the end of the exception trace, you should see the list of options > which > > it will > > take. As I said, it's missing a "--help" option, but all of the mahout > > programs, > > if given an incorrect argument, will give this stack trace, followed by > the > > list of arguments you *could* use. > > > > Seems to violate the principle of least astonishment. > Of course it does, which is why I said that it was a bug in that particular script. > If this is a systemic issue with all the command line scripts, I think we > should create a JIRA issue for it. I can work on it on the side with my > GSOC > project. > It is specific to seqdumper and vectordumper. All other actions in the script do the right thing, that I know of. > Why does this happen in the first place? > I think it's a really simple, easy-to-fix issue: VectorDumper.java has a line in main(): Group group = gbuilder.withName("Options").withOption(seqOpt).withOption(outputOpt) .withOption(dictTypeOpt).withOption(dictOpt).withOption(csvOpt).withOption(vectorAsKeyOpt) .withOption(printKeyOpt).withOption(sizeOpt).create(); but it does not have a "withOption(helpOpt)" which was defined above, and so it never checks for this option when parsing. Adding this line should make --help do the right thing. -jake > > > > > In this case, they're printed below, I'll cut the part out you need: > > > > --------------- > > Usage: > > > > [--seqFile <seqFile> --output <output> --dictionaryType <dictionaryType> > > --dictionary <dictionary> --csv --useKey --printKey --sizeOnly] > > > > Options > > > > --seqFile (-s) seqFile The Sequence File > > containing the Vectors > > --output (-o) output The output file. If > > not specified, > > dumps to the console > > --dictionaryType (-dt) dictionaryType The dictionary > > file type (text|sequencefile) > > --dictionary (-d) dictionary The dictionary file. > > --csv (-c) Output the Vector as > > CSV. Otherwise > > it substitutes in the terms for > > vector cell entries > > --useKey (-u) If the Key is a vector, then > dump > > that instead > > --printKey (-p) Print out the key as > > well, delimited > > by a tab (or the value if > > useKey is true) > > --sizeOnly (-sz) Dump only the size of the > vector > > > > ---------------- > > > > This means you want to do: > > > > ./bin/mahout -s path_to_docTopics_output -o > > path_you_want_to_write_text_output_to > > > > and then just look in path_you_want_to_write_text_output_to, and it > should > > have > > what you want. > > > > -jake > > > > On Fri, Jul 8, 2011 at 6:16 AM, huaiyang gongzi < > [email protected] > > >wrote: > > > > > Thanks, Jake. But after typing mahout vectordump --help, I got sth > > like > > > this > > > > > > 11/07/08 09:14:25 ERROR vectors.VectorDumper: Exception > > > org.apache.commons.cli2.OptionException: Unexpected --help while > > processing > > > Options > > > at > > org.apache.commons.cli2.commandline.Parser.parse(Parser.java:99) > > > at > > > > org.apache.mahout.utils.vectors.VectorDumper.main(VectorDumper.java:100) > > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > > at > > > > > > > > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > > > at > > > > > > > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > > > at java.lang.reflect.Method.invoke(Method.java:597) > > > at > > > > > > > > > org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) > > > at > > > org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) > > > at > > org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188) > > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > > at > > > > > > > > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > > > at > > > > > > > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > > > at java.lang.reflect.Method.invoke(Method.java:597) > > > at org.apache.hadoop.util.RunJar.main(RunJar.java:156) > > > Usage: > > > > > > [--seqFile <seqFile> --output <output> --dictionaryType > > > <dictionaryType> > > > --dictionary <dictionary> --csv --useKey --printKey > > > --sizeOnly] > > > Options > > > > > > --seqFile (-s) seqFile The Sequence File containing > > > the > > > > > > Vectors > > > --output (-o) output The output file. If not > > > specified, > > > dumps to the > > > console > > > --dictionaryType (-dt) dictionaryType The dictionary file > > > type > > > > > > (text|sequencefile) > > > --dictionary (-d) dictionary The dictionary > > > file. > > > --csv (-c) Output the Vector as CSV. > > > Otherwise > > > it substitutes in the terms > > > for > > > vector cell > > > entries > > > --useKey (-u) If the Key is a vector, then > > > dump > > > that > > > instead > > > --printKey (-p) Print out the key as well, > > > delimited > > > by a tab (or the value if > > useKey > > > is > > > > > > true) > > > --sizeOnly (-sz) Dump only the size of the > > > vector > > > 11/07/08 09:14:25 INFO driver.MahoutDriver: Program took 30 ms > > > > > > > > > On Thu, Jul 7, 2011 at 5:56 PM, Jake Mannix <[email protected]> > > wrote: > > > > > > > On Thu, Jul 7, 2011 at 5:53 PM, wine lover <[email protected]> > > wrote: > > > > > > > > > Dear All, > > > > > > > > > > After running LDA analysis, I got the docTopic file, which is a > > regular > > > > > sequence-file. How to transfer it into a readable format? I > searched > > > > > vectordumper, or vectordump, but did not get any useful results, > such > > > as > > > > > how > > > > > to use it in command-line? Thanks. > > > > > > > > > > > > > So you say you "searched vectordumper/vectordump", you mean you > > > > looked through the code looking for it, or you used it and it didn't > do > > > > what > > > > you wanted? > > > > > > > > If you're just not sure how to use it, try running "./bin/mahout" > from > > > your > > > > distribution directory, with no arguments, and it will print out a > > bunch > > > of > > > > possible commands, one of which is vectordump. If you try to run it > > > > with no arguments, it will sadly exit silently, not telling you what > > the > > > > usage is (this is a bug!), but if you try to give it an illegal > > argument, > > > > like > > > > > > > > ./bin/mahout vectordump --help > > > > > > > > You'll see: > > > > Usage: > > > > > > > > [--seqFile <seqFile> --output <output> --dictionaryType > > <dictionaryType> > > > > > > > > --dictionary <dictionary> --csv --useKey --printKey --sizeOnly] > > > > > > > > Options > > > > > > > > --seqFile (-s) seqFile The Sequence File > containing > > > the > > > > > > > > Vectors > > > > > > > > --output (-o) output The output file. If not > > > > specified, > > > > dumps to the console > > > > > > > > --dictionaryType (-dt) dictionaryType The dictionary file type > > > > > > > > (text|sequencefile) > > > > > > > > --dictionary (-d) dictionary The dictionary file. > > > > > > > > --csv (-c) Output the Vector as CSV. > > > > Otherwise > > > > it substitutes in the terms > > for > > > > > > > > vector cell entries > > > > > > > > --useKey (-u) If the Key is a vector, > then > > > dump > > > > > > > > that instead > > > > > > > > --printKey (-p) Print out the key as well, > > > > delimited > > > > by a tab (or the value if > > > useKey > > > > is > > > > true) > > > > > > > > --sizeOnly (-sz) Dump only the size of the > > > vector > > > > > > > > > > > > ----- > > > > > > > > If you use these instructions to point to the docTopics output > > location, > > > > you can have it print out the p(topic | document) for each > > topic/document > > > > pair in your collection. > > > > > > > > -jake > > > > > > > > > >
