Hi Eric, Thanks for the help. Most of it makes sense to me and I amgoing ahead with query boosting during search instead of index time boosting.
Could just tell me how do we arrive at the correct boost values for a query with multiple fields? My understanding is that this will vary as per the data that is being searched. My query after boosting would look something like this: +(i_title:sorting*^6.0 i_description:sorting*^3.5 i_detailedInfo:sorting*^1.5 i_tags:sorting*^1.2) -i_published:false +i_topicsClasses.id:1* The reason for using high values is that I want to make sure that multiple occurrences of the search string in say field 'i_description' should not be treated as more relevant than a single match in the 'i_title' field. Do you have some guideline using which I can arrive at the correct boost value in the above scenario? --Rakesh S > Date: Fri, 21 Dec 2007 17:15:29 -0500 > From: [EMAIL PROTECTED] > To: java-user@lucene.apache.org > Subject: Re: Boosting Vs Sorting > > See below... > > On Dec 21, 2007 12:50 PM, Rakesh Shete <[EMAIL PROTECTED]> wrote: > > > > > Hi Eric, > > > > >> I don't see how sorting relates to your problem at all.... > > > > Could you just explain how is sorting different from boosting? > > > > I have been trying to figure this out. Going through "Lucene In Action" my > > understanding of sorting is that it will kind of second level of ordering > > after the query results have been scored (Not sure if the relevance > > established > > by scoring is lost in this process). > > > > I often think of sorting as being orthogonal to boosting. They're really > unrelated. > Boosting changes how the scoring of documents work. Sorting ignores scoring > and arranges the results lexically. You can *only* sort on fields that are a > single > token. I'm cheating a little here and you can implement your own > sorts, but that's another story. > > Maybe this would help. Say you were indexing books and wanted the results > presented to the user by title. You could index a "titlesort" field that had > the > title lowercased and all spaces replaced with underscores. Then, you could > sort the result of all books containing "solar energy" by title. Where's the > score > here? The only relevance score has here is that no book in the result set > will > have a score of 0. > > I did, at one point, have to sort by score then sub-sort by title. That is, > present the user with the top scoring documents sub-sorted by title. > This involved using relevancy as the primary sort and sub-sorting by > title. But the problem here is that scores of 0.98374 wouldn't be in the > same bucket as a score of 0.98375. Search the mail archive for > "bucket" and you should see that discussion. > > > > > > > > > >> Is it *really* better for your users to see a low-relevance query > > >> that happens to have the exact words in it before a very-high > > >> ranking but not quite exact response? > > > > Nopes. Thats the last thing my product manager will want. > > > > Lets take an example to simplify this: > > > > I have fields like title, description, tags. Now when I search for a term > > "Indoor Photography" then I would like the results with exact match in > > title to be > > more important than in description or tags. However, if there is an exact > > match in description > > then it should be given more preference than the partial match in title. > > > > Going by the points mentioned below and as per one of your posts > > ( > > http://mail-archives.apache.org/mod_mbox/lucene-java-user/200609.mbox/[EMAIL > > PROTECTED] > > ) > > I understand that I need to specify query time boosting like this: > > > > title:Indoor Photography^2.5 description:Indoor Photography^1.5 tags: > > Indoor Photography^1.2 > > > > That would go some distance towards what you want, but watch the syntax. > You might be better off constructing your own BooleanQuery. The syntax above > would actually parse something like title:Indoor default_field:Photography^ > 2.5. You > need parentheses. Also think about phrase queries.... > > Hope this helps > Erick > > > > > > Let me know if this would help my cause. > > > > Thnx for ur time n the valuable info. > > > > --Rakesh S > > > > > > > > > > > > > Date: Fri, 21 Dec 2007 09:53:02 -0500 > > > From: [EMAIL PROTECTED] > > > To: java-user@lucene.apache.org > > > Subject: Re: Boosting Vs Sorting > > > > > > OK, I'm trying to adjust to a Mac and my keyboard shortcuts sometimes > > > lead me to send the mail when I didn't intend. Sorry about that... > > > > > > So, leaving aside how you form your "similar" query, I *think* you > > > want to form two clauses, your "exact" and your "similar" and > > > boost them individually, combined in a boolean query. > > > > > > This will still interleave the results I think. But it's also a valid > > > question whether this is good or bad. Is it *really* better for your > > > users to see a low-relevance query that happens to have the exact > > > words in it before a very-high ranking but not quite exact response? > > > That, of course it up to your product manager.... > > > > > > If it is really a requirement, it seems to me that you would be able to > > > just form the two queries independently, then just post-process them. > > > One query is the exact version, and the second query is the similar one. > > > Then just combine the results as you please by iterating the hits > > > object for the exact query then following it by the same for the > > similar. > > > > > > I don't see how sorting relates to your problem at all.... > > > > > > Best > > > Erick > > > > > > On Dec 21, 2007 9:46 AM, Erick Erickson <[EMAIL PROTECTED]> wrote: > > > > > > > From my perspective, index-time boosting and sorting are apples > > > > and oranges. > > > > > > > > According to a post from Hoss, index-time boosting is a way of > > > > saying that "Field x in this document is more important than > > > > field x in other documents". Query-time boosts are a way of > > > > saying "I care about field X more than field Y across *all* > > > > documents". > > > > > > > > So index time boosting doesn't seem to relate to your problem since > > > > you really want to compare field x across all documents. It seems > > > > that query-time boosting is more relevant. > > > > > > > > So, leaving aside how you form your "similar" q > > > > > > > > > > > > On Dec 20, 2007 10:50 PM, Rakesh Shete < [EMAIL PROTECTED]> > > wrote: > > > > > > > > > > > > > > Hi all, > > > > > > > > > > I am using Hibernate Search (http://www.hibernate.org/410.html) > > which is > > > > > a wrapper around Lucene for performing search over info stored in > > the DB. I > > > > > have questions related to Lucene boosting Vs sorting: > > > > > > > > > > Is index time boosting of documents and fields better than > > specifying > > > > > sorting parameters at search time? > > > > > > > > > > I have been browsing through the Lucene mail archives for an answer > > to > > > > > this. Going through them and reading on stuff related to Lucene > > scoring, my > > > > > understanding is that if I know upfront at index time that the > > relevance > > > > > order of results is based on certain fields, then, it is better to > > have > > > > > index time boosting of documents and fields. Am I right here? > > > > > > > > > > My requirements are like: > > > > > Results having an exact match to the input query string should have > > > > > highest preference followed by an exact match with field1, field2, > > field3 > > > > > and then followed by search query substring (or near match) match > > with > > > > > field1, field2, field3. > > > > > > > > > > Any suggestions are most welcome. > > > > > > > > > > --Rakesh S > > > > > > > > > > _________________________________________________________________ > > > > > Post free property ads on Yello Classifieds now! www.yello.in > > > > > http://ss1.richmedia.in/recurl.asp?pid=219 > > > > > > > > > > > > > > > > _________________________________________________________________ > > Post free property ads on Yello Classifieds now! www.yello.in > > http://ss1.richmedia.in/recurl.asp?pid=219 > > _________________________________________________________________ Post free property ads on Yello Classifieds now! www.yello.in http://ss1.richmedia.in/recurl.asp?pid=219