On Jan 3, 2010, at 9:13 AM, Bogdan Vatkov wrote: > Unfortunately it is all classified data I could not share, I will try to > debug
Can you reproduce w/ generic documents? > > On Sun, Jan 3, 2010 at 4:10 PM, Grant Ingersoll <[email protected]> wrote: > >> Is there anyway you could zip up a small document set and your Solr home >> and post somewhere? >> >> On Jan 3, 2010, at 9:08 AM, Bogdan Vatkov wrote: >> >>> Yesterday I had issues with mapping cluster results to dictionary entries >> - >>> it happened that I was using different dictionary - therefore the result >>> clusters shown really strange results. >>> But once I fixed all the commands, input/output files, etc. I got very >> good >>> result from clusterization POV (I mean clusters are quite correct having >> in >>> mind the input documents) but unfortunately the clusters contained mostly >>> words which I would like to stop - and which words I placed in the >>> stopwords.txt in Solr (re-indexed, restarted Solr, etc.). >>> >>> Where do you suggest I debug the vector creation? Seems Solr respects the >>> stopwords but not the vector creation (then clustering). >>> >>> On Sun, Jan 3, 2010 at 4:02 PM, Grant Ingersoll <[email protected]> >> wrote: >>> >>>> >>>> On Jan 3, 2010, at 8:58 AM, Bogdan Vatkov wrote: >>>> >>>>> I have stopwords.txt file with 1200+ words, i did not understand this >>>> with >>>>> the stemming - you mean my stopwords are somehow ignored due to some >>>>> stemming or ? >>>> >>>> No, stopword removal happens before stemming so it is possible that a >> word >>>> that was not stopped was then stemmed to a stopword. >>>> >>>> I thought you said yesterday you got it straightened out. >>>> >>>>> >>>>> On Sun, Jan 3, 2010 at 3:53 PM, Grant Ingersoll <[email protected]> >>>> wrote: >>>>> >>>>>> Are you sure you have stopwords and it is not the result of stemming >>>> some >>>>>> other word? >>>>>> >>>>>> On Jan 3, 2010, at 7:57 AM, Bogdan Vatkov wrote: >>>>>> >>>>>>> my Solr config is like the default one: >>>>>>> >>>>>>> <field name="msg_body" type="text" termVectors="true" indexed="true" >>>>>>> stored="true"/> >>>>>>> >>>>>>> <fieldType name="text" class="solr.TextField" >>>>>> positionIncrementGap="100"> >>>>>>> <analyzer type="index"> >>>>>>> <tokenizer class="solr.WhitespaceTokenizerFactory"/> >>>>>>> <filter class="solr.StopFilterFactory" >>>>>>> ignoreCase="true" >>>>>>> words="stopwords.txt" >>>>>>> enablePositionIncrements="true" >>>>>>> /> >>>>>>> <filter class="solr.WordDelimiterFilterFactory" >>>>>>> generateWordParts="1" generateNumberParts="1" catenateWords="1" >>>>>>> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> >>>>>>> <filter class="solr.LowerCaseFilterFactory"/> >>>>>>> <filter class="solr.SnowballPorterFilterFactory" >>>>>> language="English" >>>>>>> protected="protwords.txt"/> >>>>>>> </analyzer> >>>>>>> <analyzer type="query"> >>>>>>> <tokenizer class="solr.WhitespaceTokenizerFactory"/> >>>>>>> <filter class="solr.SynonymFilterFactory" >> synonyms="synonyms.txt" >>>>>>> ignoreCase="true" expand="true"/> >>>>>>> <filter class="solr.StopFilterFactory" >>>>>>> ignoreCase="true" >>>>>>> words="stopwords.txt" >>>>>>> enablePositionIncrements="true" >>>>>>> /> >>>>>>> <filter class="solr.WordDelimiterFilterFactory" >>>>>>> generateWordParts="1" generateNumberParts="1" catenateWords="0" >>>>>>> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/> >>>>>>> <filter class="solr.LowerCaseFilterFactory"/> >>>>>>> <filter class="solr.SnowballPorterFilterFactory" >>>>>> language="English" >>>>>>> protected="protwords.txt"/> >>>>>>> </analyzer> >>>>>>> </fieldType> >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Best regards, >>>>> Bogdan >>>> >>>> >>> >>> >>> -- >>> Best regards, >>> Bogdan >> >> > > > -- > Best regards, > Bogdan -------------------------- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem using Solr/Lucene: http://www.lucidimagination.com/search
