On Jan 3, 2010, at 8:58 AM, Bogdan Vatkov wrote: > I have stopwords.txt file with 1200+ words, i did not understand this with > the stemming - you mean my stopwords are somehow ignored due to some > stemming or ?
No, stopword removal happens before stemming so it is possible that a word that was not stopped was then stemmed to a stopword. I thought you said yesterday you got it straightened out. > > On Sun, Jan 3, 2010 at 3:53 PM, Grant Ingersoll <[email protected]> wrote: > >> Are you sure you have stopwords and it is not the result of stemming some >> other word? >> >> On Jan 3, 2010, at 7:57 AM, Bogdan Vatkov wrote: >> >>> my Solr config is like the default one: >>> >>> <field name="msg_body" type="text" termVectors="true" indexed="true" >>> stored="true"/> >>> >>> <fieldType name="text" class="solr.TextField" >> positionIncrementGap="100"> >>> <analyzer type="index"> >>> <tokenizer class="solr.WhitespaceTokenizerFactory"/> >>> <filter class="solr.StopFilterFactory" >>> ignoreCase="true" >>> words="stopwords.txt" >>> enablePositionIncrements="true" >>> /> >>> <filter class="solr.WordDelimiterFilterFactory" >>> generateWordParts="1" generateNumberParts="1" catenateWords="1" >>> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> >>> <filter class="solr.LowerCaseFilterFactory"/> >>> <filter class="solr.SnowballPorterFilterFactory" >> language="English" >>> protected="protwords.txt"/> >>> </analyzer> >>> <analyzer type="query"> >>> <tokenizer class="solr.WhitespaceTokenizerFactory"/> >>> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" >>> ignoreCase="true" expand="true"/> >>> <filter class="solr.StopFilterFactory" >>> ignoreCase="true" >>> words="stopwords.txt" >>> enablePositionIncrements="true" >>> /> >>> <filter class="solr.WordDelimiterFilterFactory" >>> generateWordParts="1" generateNumberParts="1" catenateWords="0" >>> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/> >>> <filter class="solr.LowerCaseFilterFactory"/> >>> <filter class="solr.SnowballPorterFilterFactory" >> language="English" >>> protected="protwords.txt"/> >>> </analyzer> >>> </fieldType> >> >> > > > -- > Best regards, > Bogdan
