Hi, Is there any idea of how to make it work? Many thanks, Liat 2009/3/9 liat oren <oren.l...@gmail.com>
> I have an index that has for every two words a score. > I would like my analyzer - that is a combination of whitespace tokenizer, a > stop words analyzer and stemming. > > The regular score of Lucene takes into account the position of the words. > > I would like to add another factor to that score which is these score > between words. > Instead of having score 0 to words that are not equal, I would like to use > this index in the calculation. > > Is it better explained? > > Thanks a lot, > Liat > > 2009/3/9 Grant Ingersoll <gsing...@apache.org> > > Hmmm, I have some inklings of an idea, but can we take a step back? Can >> you explain the problem you are trying to solve at a higher level (instead >> of the current solution)? I imagine it is something related to >> co-occurrence analysis. >> >> >> >> On Mar 8, 2009, at 8:05 AM, liat oren wrote: >> >> Hi Grant, >>> >>> No, you can only have two words - the score is between two words. >>> >>> "cat dog" and "dog cat" is equivalent, it will actually always be "cat >>> dog" >>> - going by alphabetic order. >>> >>> About the boosting, I read a bit about it - but couldn't find how it can >>> help me, unless I change every appearance of the word dog to have also >>> cat >>> and animal using the weight of the score. >>> So, for example, every word will appear 10 times from what it is - if >>> apple >>> appears 1, I will do the boosting so it appears 10 times. >>> If dog appears, then it will also have cat twice (0.2*10) and animal 5 >>> times(0.5*10). >>> >>> But I hope to have another better solution. >>> >>> >>> Thanks >>> 2009/3/8 Grant Ingersoll <gsing...@apache.org> >>> >>> Hi Liat, >>>> >>>> Some questions inline below. >>>> >>>> On Mar 8, 2009, at 5:49 AM, liat oren wrote: >>>> >>>> Hi, >>>> >>>>> >>>>> I have scores between words, for example - dog and animal have a score >>>>> of >>>>> 0.5 (and not 0), dog and cat have a score of 0.2, etc. >>>>> These scores are stored in an index: >>>>> Doc1: field words: dog animal >>>>> field score: 0.5 >>>>> Doc2: field words: dog cat >>>>> field score: 0.2 >>>>> >>>>> If the user searches for the word dog - I would like that documents >>>>> that >>>>> contain the word animal or cat will also get a good score (that will >>>>> take >>>>> into account the 0.5 and 0.2). >>>>> >>>>> >>>> Is it always the case that these come in pairs? In other words, would >>>> you >>>> ever have: >>>> field words: dog cat animal >>>> score: 0.9 >>>> >>>> Also, is the following equivalent, or would it have a different score: >>>> field words: cat dog >>>> score: 0.2 >>>> >>>> >>>> >>>> >>>>> Basically what I do is: for every document in the database, I loop over >>>>> the >>>>> words that appear in the query (the query is long in a size of an >>>>> article) >>>>> and for every word that appears in each document I take the score from >>>>> the >>>>> index mentioned above and calculating a score between the query and >>>>> each >>>>> document. >>>>> >>>>> Any suggestion how to do it using Lucene search? How to add these >>>>> values >>>>> to >>>>> the searcher? >>>>> >>>>> >>>> Thinking... >>>> >>>> >>>> >>>>> I looked at the boosting option, but couldn't really see how it helps >>>>> me >>>>> to >>>>> that matter. >>>>> >>>>> >>>> What "boosting option" did you look at? Can you explain a bit more? >>>> >>>> >>>> -------------------------- >>>> Grant Ingersoll >>>> http://www.lucidimagination.com/ >>>> >>>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using >>>> Solr/Lucene: >>>> http://www.lucidimagination.com/search >>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>> >>>> >>>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> >