[
https://issues.apache.org/jira/browse/MAHOUT-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tharindu Rusira updated MAHOUT-1319:
------------------------------------
Attachment: MAHOUT-1319-custom-filter.patch
Hi [~smarthi],
I applied your patch and tested against Mahout 0.8. All tests pass but I
encountered an exception while running seqdirectory command with -xm mapreduce.
I used the same PrefixAdditionFilter to implement a custom filter
(MyTestFilter.java) with simple print line statement that says "THIS IS A
CUSTOM FILTER" (I have attached the new filter class
MAHOUT-1319-custom-filter.patch)
Here're the results I got. It seems Mahout fails to find the constructor of the
custom filter(java.lang.NoSuchMethodException:
org.apache.mahout.text.MyTestFilter.<init>()).
Do you see any reason for this behaviour?
Mac-mini-Tharindu:mahout-0.8 tkumara$ bin/mahout seqdirectory -i
dictionary_dataset -o ~/Desktop/seqfiles -ow -filter
org.apache.mahout.text.MyTestFilter
hadoop binary is not in PATH,HADOOP_HOME/bin,HADOOP_PREFIX/bin, running locally
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/mahout-0.8/examples/target/mahout-examples-0.8-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/mahout-0.8/examples/target/dependency/slf4j-jcl-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.JCLLoggerFactory]
Dec 19, 2013 12:33:14 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: Command line arguments: {--charset=[UTF-8], --chunkSize=[64],
--endPhase=[2147483647],
--fileFilterClass=[org.apache.mahout.text.MyTestFilter],
--input=[dictionary_dataset], --keyPrefix=[], --method=[mapreduce],
--output=[/Users/tkumara/Desktop/seqfiles], --overwrite=null, --startPhase=[0],
--tempDir=[temp]}
Dec 19, 2013 12:33:14 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: Deleting /Users/tkumara/Desktop/seqfiles
Exception in thread "main" java.lang.IllegalStateException:
java.lang.NoSuchMethodException: org.apache.mahout.text.MyTestFilter.<init>()
at org.apache.mahout.common.ClassUtils.instantiateAs(ClassUtils.java:68)
at org.apache.mahout.common.ClassUtils.instantiateAs(ClassUtils.java:28)
at
org.apache.mahout.text.SequenceFilesFromDirectory.runMapReduce(SequenceFilesFromDirectory.java:143)
at
org.apache.mahout.text.SequenceFilesFromDirectory.run(SequenceFilesFromDirectory.java:90)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at
org.apache.mahout.text.SequenceFilesFromDirectory.main(SequenceFilesFromDirectory.java:64)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:194)
Caused by: java.lang.NoSuchMethodException:
org.apache.mahout.text.MyTestFilter.<init>()
at java.lang.Class.getConstructor0(Class.java:2810)
at java.lang.Class.getConstructor(Class.java:1718)
at org.apache.mahout.common.ClassUtils.instantiateAs(ClassUtils.java:62)
... 13 more
But my custom filter works fine when running in the sequential mode.
Mac-mini-Tharindu:mahout-0.8 tkumara$ bin/mahout seqdirectory -i
dictionary_dataset -o ~/Desktop/seqfiles -ow -filter
org.apache.mahout.text.MyTestFilter -xm sequential
hadoop binary is not in PATH,HADOOP_HOME/bin,HADOOP_PREFIX/bin, running locally
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/mahout-0.8/examples/target/mahout-examples-0.8-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/mahout-0.8/examples/target/dependency/slf4j-jcl-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.JCLLoggerFactory]
Dec 19, 2013 12:32:43 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: Command line arguments: {--charset=[UTF-8], --chunkSize=[64],
--endPhase=[2147483647],
--fileFilterClass=[org.apache.mahout.text.MyTestFilter],
--input=[dictionary_dataset], --keyPrefix=[], --method=[sequential],
--output=[/Users/tkumara/Desktop/seqfiles], --overwrite=null, --startPhase=[0],
--tempDir=[temp]}
Dec 19, 2013 12:32:44 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: Deleting /Users/tkumara/Desktop/seqfiles
Dec 19, 2013 12:32:44 PM org.apache.hadoop.util.NativeCodeLoader <clinit>
WARNING: Unable to load native-hadoop library for your platform... using
builtin-java classes where applicable
THIS IS A CUSTOM FILTER
Dec 19, 2013 12:32:44 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: Program took 481 ms (Minutes: 0.008016666666666667)
> seqdirectory -filter argument silently ignored when run as MR
> -------------------------------------------------------------
>
> Key: MAHOUT-1319
> URL: https://issues.apache.org/jira/browse/MAHOUT-1319
> Project: Mahout
> Issue Type: Bug
> Components: Integration
> Affects Versions: 0.8
> Reporter: Liz Merkhofer
> Assignee: Suneel Marthi
> Labels: seqdirectory, text
> Fix For: 0.9
>
> Attachments: MAHOUT-1319-custom-filter.patch, MAHOUT-1319.patch
>
>
> Running "seqdirectory" (Sequence Files from Input Directory) from the command
> line and specifying a custom filter using the -filter parameter, the argument
> is ignored and the default "PrefixAdditionFilter" is used on the input. No
> exception is thrown.
> When the same command is run with "-xm sequential", the filter is found and
> works as expected.
--
This message was sent by Atlassian JIRA
(v6.1.4#6159)