[ 
https://issues.apache.org/jira/browse/MAHOUT-587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shige Takeda updated MAHOUT-587:
--------------------------------

    Attachment: 0001-added-D-option-support-to-seq2sparse.patch

I made a preliminary patch to share the idea how this can be fixed. Here are 
some comments:

Test case:
$MAHOUT_HOME/bin/mahout seq2sparse --input text_output --output seq_output -Dk=v

where "text_output" includes the output from "seqdirectory".

- This fix works if I run a "mahout" from command line against a Hadoop 
cluster. A unit test case doesn't work; -Dkey=value is not parsed in ToolRunner 
but passed through to SparseVectorsFromSequenceFiles, resulting in a parse 
error. Did I make any mistake? Maybe because a JUnit doesn't really mimic a 
real hadoop env? I'm wondering if there is any unit test available that kicks 
off "mahout" from a command line rather than calling F.main(String[]) static 
method.

- I was not sure about the convention of function parameter order, i.e., 
F(a,b,conf) or F(conf, a,b), but I just followed 
DictionaryVectorizer.createTermFrequencyVectors that places Configuration 
parameter after Paths. IMHO, Configuration should be the first or the the last 
arguments.

> -Dmapred.job.queue.name=unfunded is not counted in for seq2sparse
> -----------------------------------------------------------------
>
>                 Key: MAHOUT-587
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-587
>             Project: Mahout
>          Issue Type: Bug
>          Components: Utils
>    Affects Versions: 0.4
>         Environment: RHL Linux 2.6.18 x86_64
>            Reporter: Shige Takeda
>            Priority: Minor
>         Attachments: 0001-added-D-option-support-to-seq2sparse.patch
>
>
> I revisited this and found the -D problem still remains in seq2sparse... 
> $ $MAHOUT_HOME/bin/mahout seq2sparse --input text_output --output seq_output 
> -Dmapred.job.queue.name=unfunded
> Running on hadoop, using HADOOP_HOME=/grid/0/gs/hadoop/current
> HADOOP_CONF_DIR=/grid/0/gs/conf/current
> 11/01/21 20:12:39 ERROR vectorizer.SparseVectorsFromSequenceFiles: Exception
> org.apache.commons.cli2.OptionException: Unexpected 
> -Dmapred.job.queue.name=unfunded while processing Options
>       at org.apache.commons.cli2.commandline.Parser.parse(Parser.java:99)
>       at 
> org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.main(SparseVectorsFromSequenceFiles.java:137)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>       at java.lang.reflect.Method.invoke(Method.java:597)
>       at 
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>       at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>       at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:174)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>       at java.lang.reflect.Method.invoke(Method.java:597)
>       at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> Usage:                                                                        
>   
> ...
> The cause is obvious; as somebody mentioned (as well as I see from source 
> code), 
> ./core/src/main/java/org/apache/mahout/vectorizer/SparseVectorsFromSequenceFiles.java
>  doesn't use ToolRunner, and an appropriate propagation of config object to 
> MR jobs is missing.
> Although this may be a known issue, since it is not filed in JIRA, I've done 
> just in case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to