WikipediaAnalyzer in 0.5 would fail due to lucene3.1's CharArraySet.iterator() 
returns an iterator of "char[]" instead of a "String" list
-----------------------------------------------------------------------------------------------------------------------------------------

                 Key: MAHOUT-748
                 URL: https://issues.apache.org/jira/browse/MAHOUT-748
             Project: Mahout
          Issue Type: Bug
          Components: Examples
    Affects Versions: 0.5
            Reporter: steven zhuang
            Priority: Minor


in mahout0.5, the class org.apache.mahout.analysis.WikipediaAnalyzer would fail 
to be constructed.
the statement around WikipediaAnalyzer.java line 38:
   stopSet = (CharArraySet) StopFilter.makeStopSet(Version.LUCENE_31,
        StopAnalyzer.ENGLISH_STOP_WORDS_SET.toArray(new 
String[StopAnalyzer.ENGLISH_STOP_WORDS_SET.size()]));
  will raise an ArrayStoreException exception due to 
          StopAnalyzer.ENGLISH_STOP_WORDS_SET.toArray(String[] ) will throw 
such an exception.
   the cause is that in lucene3.1, when version number is bigger than 3.0, the 
CharArraySet.iterator() method returns an 'char[]' iterator instead of an 
"String" list.

see code from CharArraySet.java:

  @Override @SuppressWarnings("unchecked")
  public Iterator<Object> iterator() {
    // use the AbstractSet#keySet()'s iterator (to not produce endless 
recursion)
    return map.matchVersion.onOrAfter(Version.LUCENE_31) ?
      map.originalKeySet().iterator() : (Iterator) stringIterator();
  }

so in WikipediaAnalyzer() we may need to make a transform from char[] to String 
to make it work.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to