Grant, 

i'm trying to generate the Sequence Vectors using the SnowballAnlyzer as
opposed to the StandardAnlyzer. I've already gone through this process using
the StandardAnlyzer and plotted the output clusters using the k-means dump
file, so i'm familiar with clustering in Mahout. i'd like to repeat this
exercise with the SnowballAnlyzer, running the following command. 

./mahout seq2sparse -s 2 -a
org.apache.lucene.anlysis.snowball.SnowballAnlyzer -chunk 100 -i
/home/hadoop/tmp/trecdata-seqfiles/chunk-0 -o
/home/hadoop/tmp/trecdata-vectors -md 1 -x 75 -wt TFIDF -n 0

1) i've placed the lucene-snowball jar in the  m2 repository
/home/delroy/.m2/repository/org/apache/lucene/lucene-snowball/2.9.1

2) and i also updated the Mahout_CORE/pom xml to reflect the dependency 
<!-- updated by Delroy to use Snowball Anlyzer -->
    <dependency>
      <groupId>org.apache.lucene</groupId>
      <artifactId>lucene-snowball</artifactId>
      <version>2.9.1</version>
    </dependency>

3) then i did a mvn install on the Mahout_CORE and on Mahout_ROOT, which
downloaded the lucene-snowball pom and lucene-snowball pom sha1 to the m2
repository 

this error seems to stem from developer code, which incidentally notes that
you should not instantiate the anlyzer at
SparseVectorsFromSequenceFiles.java:176 any suggestions here?

Output:
Exception in thread "main" java.lang.InstantiationException:
org.apache.lucene.anlysis.snowball.SnowballAnlyzer
        at java.lang.Class.newInstance0(Class.java:357)
        at java.lang.Class.newInstance(Class.java:325)
        at org.apache.mahout.text.SparseVectorsFromSequenceFiles.main()
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:616)
        at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
        at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
        at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:172)

PS: I just love the spam filter..won't let me write too many variants of the
word Analyzer because it contains the word anal. 


-----
--cheers
Delroy
-- 
View this message in context: 
http://n3.nabble.com/SnowballAnalyzer-tp729983p732912.html
Sent from the Mahout User List mailing list archive at Nabble.com.

Reply via email to