Hi Eric,

Thanks for the help. Most of it makes sense to me and I amgoing ahead with 
query boosting during search instead of index time boosting.

Could just tell me how do we arrive at the correct boost values for a query 
with multiple fields? My understanding is that this will vary as per the data 
that is being searched.

My query after boosting would look something like this:

+(i_title:sorting*^6.0 i_description:sorting*^3.5 i_detailedInfo:sorting*^1.5 
i_tags:sorting*^1.2) -i_published:false +i_topicsClasses.id:1*

The reason for using high values is that I want to make sure that multiple 
occurrences of the search string in say field 'i_description' should not be 
treated as more relevant than a single match in the 'i_title' field.

Do you have some guideline using which I can arrive at the correct boost value 
in the above scenario?

--Rakesh S

> Date: Fri, 21 Dec 2007 17:15:29 -0500
> From: [EMAIL PROTECTED]
> To: java-user@lucene.apache.org
> Subject: Re: Boosting Vs Sorting
> 
> See below...
> 
> On Dec 21, 2007 12:50 PM, Rakesh Shete <[EMAIL PROTECTED]> wrote:
> 
> >
> > Hi Eric,
> >
> > >> I don't see how sorting relates to your problem at all....
> >
> > Could you just explain how is sorting different from boosting?
> >
> > I have been trying to figure this out. Going through "Lucene In Action" my
> > understanding of  sorting is that it will kind of second level of ordering
> > after the query results have been scored (Not sure if the relevance
> > established
> > by scoring is lost in this process).
> >
> 
> I often think of sorting as being orthogonal to boosting. They're really
> unrelated.
> Boosting changes how the scoring of documents work. Sorting ignores scoring
> and arranges the results lexically. You can *only* sort on fields that are a
> single
> token. I'm cheating a little here and you can implement your own
> sorts, but that's another story.
> 
> Maybe this would help. Say you were indexing books and wanted the results
> presented to the user by title. You could index a "titlesort" field that had
> the
> title lowercased and all spaces replaced with underscores. Then, you could
> sort the result of all books containing "solar energy" by title. Where's the
> score
> here? The only relevance score has here is that no book in the result set
> will
> have a score of 0.
> 
> I did, at one point, have to sort by score then sub-sort by title. That is,
> present the user with the top scoring documents sub-sorted by title.
> This involved using relevancy as the primary sort and sub-sorting by
> title. But the problem here is that scores of 0.98374 wouldn't be in the
> same bucket as a score of 0.98375. Search the mail archive for
> "bucket" and you should see that discussion.
> 
> 
> 
> >
> >
> > >> Is it *really* better for your users to see a low-relevance query
> > >> that happens to have the exact words in it before a very-high
> > >> ranking but not quite exact response?
> >
> > Nopes. Thats the last thing my product manager will want.
> >
> > Lets take an example to simplify this:
> >
> > I have fields like title, description, tags. Now when I search for a term
> > "Indoor Photography" then I would like the results with exact match in
> > title to be
> > more important than in description or tags. However, if there is an exact
> > match in description
> > then it should be given more preference than the partial match in title.
> >
> > Going by the points mentioned below and as per one of your posts
> > (
> > http://mail-archives.apache.org/mod_mbox/lucene-java-user/200609.mbox/[EMAIL
> >  PROTECTED]
> > )
> > I understand that I need to specify query time boosting like this:
> >
> > title:Indoor Photography^2.5 description:Indoor Photography^1.5 tags:
> > Indoor Photography^1.2
> >
> 
> That would go some distance towards what you want, but watch the syntax.
> You might be better off constructing your own BooleanQuery. The syntax above
> would actually parse something like title:Indoor default_field:Photography^
> 2.5. You
> need parentheses. Also think about phrase queries....
> 
> Hope this helps
> Erick
> 
> 
> >
> > Let me know if this would help my cause.
> >
> > Thnx for ur time n the valuable info.
> >
> > --Rakesh S
> >
> >
> >
> >
> >
> > > Date: Fri, 21 Dec 2007 09:53:02 -0500
> > > From: [EMAIL PROTECTED]
> > > To: java-user@lucene.apache.org
> > > Subject: Re: Boosting Vs Sorting
> > >
> > > OK, I'm trying to adjust to a Mac and my keyboard shortcuts sometimes
> > > lead me to send the mail when I didn't intend. Sorry about that...
> > >
> > > So, leaving aside how you form your "similar" query, I *think* you
> > > want to form two clauses, your "exact" and your "similar" and
> > > boost them individually, combined in a boolean query.
> > >
> > > This will still interleave the results I think. But it's also a valid
> > > question whether this is good or bad. Is it *really* better for your
> > > users to see a low-relevance query that happens to have the exact
> > > words in it before a very-high ranking but not quite exact response?
> > > That, of course it up to your product manager....
> > >
> > > If it is really a requirement, it seems to me that you would be able to
> > > just form the two queries independently, then just post-process them.
> > > One query is the exact version, and the second query is the similar one.
> > > Then just combine the results as you please by iterating the hits
> > > object for the exact query then following it by the same for the
> > similar.
> > >
> > > I don't see how sorting relates to your problem at all....
> > >
> > > Best
> > > Erick
> > >
> > > On Dec 21, 2007 9:46 AM, Erick Erickson <[EMAIL PROTECTED]> wrote:
> > >
> > > > From my perspective, index-time boosting and sorting are apples
> > > > and oranges.
> > > >
> > > > According to a post from Hoss, index-time boosting is a way of
> > > > saying that "Field x in this document is more important than
> > > > field x in other documents". Query-time boosts are a way of
> > > > saying "I care about field X more than field Y across *all*
> > > > documents".
> > > >
> > > > So index time boosting doesn't seem to relate to your problem since
> > > > you really want to compare field x across all documents. It seems
> > > > that query-time boosting is more relevant.
> > > >
> > > > So, leaving aside how you form your "similar" q
> > > >
> > > >
> > > > On Dec 20, 2007 10:50 PM, Rakesh Shete < [EMAIL PROTECTED]>
> > wrote:
> > > >
> > > > >
> > > > > Hi all,
> > > > >
> > > > > I am using Hibernate Search (http://www.hibernate.org/410.html)
> > which is
> > > > > a wrapper around Lucene for performing search over info stored in
> > the DB. I
> > > > > have questions related to Lucene boosting Vs sorting:
> > > > >
> > > > > Is index time boosting of documents and fields better than
> > specifying
> > > > > sorting parameters at search time?
> > > > >
> > > > > I have been browsing through the Lucene mail archives for an answer
> > to
> > > > > this. Going through them and reading on stuff related to Lucene
> > scoring, my
> > > > > understanding is that if I know upfront at index time that the
> > relevance
> > > > > order of results is based on certain fields, then, it is better to
> > have
> > > > > index time boosting of documents and fields. Am I right here?
> > > > >
> > > > > My requirements are like:
> > > > > Results having an exact match to the input query string should have
> > > > > highest preference followed by an exact match with field1, field2,
> > field3
> > > > > and then followed by search query substring (or near match) match
> > with
> > > > > field1, field2, field3.
> > > > >
> > > > > Any suggestions are most welcome.
> > > > >
> > > > > --Rakesh S
> > > > >
> > > > > _________________________________________________________________
> > > > > Post free property ads on Yello Classifieds now! www.yello.in
> > > > > http://ss1.richmedia.in/recurl.asp?pid=219
> > > >
> > > >
> > > >
> >
> > _________________________________________________________________
> > Post free property ads on Yello Classifieds now! www.yello.in
> > http://ss1.richmedia.in/recurl.asp?pid=219
> >

_________________________________________________________________
Post free property ads on Yello Classifieds now! www.yello.in
http://ss1.richmedia.in/recurl.asp?pid=219

Reply via email to