I have an index that has for every two words a score. I would like my analyzer - that is a combination of whitespace tokenizer, a stop words analyzer and stemming.
The regular score of Lucene takes into account the position of the words. I would like to add another factor to that score which is these score between words. Instead of having score 0 to words that are not equal, I would like to use this index in the calculation. Is it better explained? Thanks a lot, Liat 2009/3/9 Grant Ingersoll <gsing...@apache.org> > Hmmm, I have some inklings of an idea, but can we take a step back? Can > you explain the problem you are trying to solve at a higher level (instead > of the current solution)? I imagine it is something related to > co-occurrence analysis. > > > > On Mar 8, 2009, at 8:05 AM, liat oren wrote: > > Hi Grant, >> >> No, you can only have two words - the score is between two words. >> >> "cat dog" and "dog cat" is equivalent, it will actually always be "cat >> dog" >> - going by alphabetic order. >> >> About the boosting, I read a bit about it - but couldn't find how it can >> help me, unless I change every appearance of the word dog to have also cat >> and animal using the weight of the score. >> So, for example, every word will appear 10 times from what it is - if >> apple >> appears 1, I will do the boosting so it appears 10 times. >> If dog appears, then it will also have cat twice (0.2*10) and animal 5 >> times(0.5*10). >> >> But I hope to have another better solution. >> >> >> Thanks >> 2009/3/8 Grant Ingersoll <gsing...@apache.org> >> >> Hi Liat, >>> >>> Some questions inline below. >>> >>> On Mar 8, 2009, at 5:49 AM, liat oren wrote: >>> >>> Hi, >>> >>>> >>>> I have scores between words, for example - dog and animal have a score >>>> of >>>> 0.5 (and not 0), dog and cat have a score of 0.2, etc. >>>> These scores are stored in an index: >>>> Doc1: field words: dog animal >>>> field score: 0.5 >>>> Doc2: field words: dog cat >>>> field score: 0.2 >>>> >>>> If the user searches for the word dog - I would like that documents that >>>> contain the word animal or cat will also get a good score (that will >>>> take >>>> into account the 0.5 and 0.2). >>>> >>>> >>> Is it always the case that these come in pairs? In other words, would >>> you >>> ever have: >>> field words: dog cat animal >>> score: 0.9 >>> >>> Also, is the following equivalent, or would it have a different score: >>> field words: cat dog >>> score: 0.2 >>> >>> >>> >>> >>>> Basically what I do is: for every document in the database, I loop over >>>> the >>>> words that appear in the query (the query is long in a size of an >>>> article) >>>> and for every word that appears in each document I take the score from >>>> the >>>> index mentioned above and calculating a score between the query and each >>>> document. >>>> >>>> Any suggestion how to do it using Lucene search? How to add these values >>>> to >>>> the searcher? >>>> >>>> >>> Thinking... >>> >>> >>> >>>> I looked at the boosting option, but couldn't really see how it helps me >>>> to >>>> that matter. >>>> >>>> >>> What "boosting option" did you look at? Can you explain a bit more? >>> >>> >>> -------------------------- >>> Grant Ingersoll >>> http://www.lucidimagination.com/ >>> >>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using >>> Solr/Lucene: >>> http://www.lucidimagination.com/search >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>> >>> >>> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >