[ 
https://issues.apache.org/jira/browse/MAHOUT-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tharindu Rusira updated MAHOUT-1319:
------------------------------------

    Attachment: MAHOUT-1319-custom-filter.patch

Hi [~smarthi],
I applied your patch and tested against Mahout 0.8. All tests pass but I 
encountered an exception while running seqdirectory command with -xm mapreduce. 
I used the same PrefixAdditionFilter to implement a custom filter 
(MyTestFilter.java) with simple print line statement that says "THIS IS A 
CUSTOM FILTER" (I have attached the new filter class 
MAHOUT-1319-custom-filter.patch)
Here're the results I got. It seems Mahout fails to find the constructor of the 
custom filter(java.lang.NoSuchMethodException: 
org.apache.mahout.text.MyTestFilter.<init>()).
Do you see any reason for this behaviour? 


Mac-mini-Tharindu:mahout-0.8 tkumara$ bin/mahout seqdirectory -i 
dictionary_dataset -o ~/Desktop/seqfiles -ow -filter 
org.apache.mahout.text.MyTestFilter
hadoop binary is not in PATH,HADOOP_HOME/bin,HADOOP_PREFIX/bin, running locally
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in 
[jar:file:/mahout-0.8/examples/target/mahout-examples-0.8-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in 
[jar:file:/mahout-0.8/examples/target/dependency/slf4j-jcl-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.JCLLoggerFactory]
Dec 19, 2013 12:33:14 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: Command line arguments: {--charset=[UTF-8], --chunkSize=[64], 
--endPhase=[2147483647], 
--fileFilterClass=[org.apache.mahout.text.MyTestFilter], 
--input=[dictionary_dataset], --keyPrefix=[], --method=[mapreduce], 
--output=[/Users/tkumara/Desktop/seqfiles], --overwrite=null, --startPhase=[0], 
--tempDir=[temp]}
Dec 19, 2013 12:33:14 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: Deleting /Users/tkumara/Desktop/seqfiles

Exception in thread "main" java.lang.IllegalStateException: 
java.lang.NoSuchMethodException: org.apache.mahout.text.MyTestFilter.<init>()
at org.apache.mahout.common.ClassUtils.instantiateAs(ClassUtils.java:68)
at org.apache.mahout.common.ClassUtils.instantiateAs(ClassUtils.java:28)
at 
org.apache.mahout.text.SequenceFilesFromDirectory.runMapReduce(SequenceFilesFromDirectory.java:143)
at 
org.apache.mahout.text.SequenceFilesFromDirectory.run(SequenceFilesFromDirectory.java:90)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at 
org.apache.mahout.text.SequenceFilesFromDirectory.main(SequenceFilesFromDirectory.java:64)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:194)
Caused by: java.lang.NoSuchMethodException: 
org.apache.mahout.text.MyTestFilter.<init>()
at java.lang.Class.getConstructor0(Class.java:2810)
at java.lang.Class.getConstructor(Class.java:1718)
at org.apache.mahout.common.ClassUtils.instantiateAs(ClassUtils.java:62)
... 13 more

But my custom filter works fine when running in the sequential mode.

Mac-mini-Tharindu:mahout-0.8 tkumara$ bin/mahout seqdirectory -i 
dictionary_dataset -o ~/Desktop/seqfiles -ow -filter 
org.apache.mahout.text.MyTestFilter -xm sequential
hadoop binary is not in PATH,HADOOP_HOME/bin,HADOOP_PREFIX/bin, running locally
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in 
[jar:file:/mahout-0.8/examples/target/mahout-examples-0.8-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in 
[jar:file:/mahout-0.8/examples/target/dependency/slf4j-jcl-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.JCLLoggerFactory]
Dec 19, 2013 12:32:43 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: Command line arguments: {--charset=[UTF-8], --chunkSize=[64], 
--endPhase=[2147483647], 
--fileFilterClass=[org.apache.mahout.text.MyTestFilter], 
--input=[dictionary_dataset], --keyPrefix=[], --method=[sequential], 
--output=[/Users/tkumara/Desktop/seqfiles], --overwrite=null, --startPhase=[0], 
--tempDir=[temp]}
Dec 19, 2013 12:32:44 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: Deleting /Users/tkumara/Desktop/seqfiles
Dec 19, 2013 12:32:44 PM org.apache.hadoop.util.NativeCodeLoader <clinit>
WARNING: Unable to load native-hadoop library for your platform... using 
builtin-java classes where applicable
THIS IS A CUSTOM FILTER
Dec 19, 2013 12:32:44 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: Program took 481 ms (Minutes: 0.008016666666666667)

> seqdirectory -filter argument silently ignored when run as MR
> -------------------------------------------------------------
>
>                 Key: MAHOUT-1319
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1319
>             Project: Mahout
>          Issue Type: Bug
>          Components: Integration
>    Affects Versions: 0.8
>            Reporter: Liz Merkhofer
>            Assignee: Suneel Marthi
>              Labels: seqdirectory, text
>             Fix For: 0.9
>
>         Attachments: MAHOUT-1319-custom-filter.patch, MAHOUT-1319.patch
>
>
> Running "seqdirectory" (Sequence Files from Input Directory) from the command 
> line and specifying a custom filter using the -filter parameter, the argument 
> is ignored and the default "PrefixAdditionFilter" is used on the input. No 
> exception is thrown.
> When the same command is run with "-xm sequential", the filter is found and 
> works as expected.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Reply via email to