date:20081106

Can Lucene tells which field matched ?

2008-11-06 Thread Dora


Hi 

I am new to Lucene and working on a search module for some XML data:

I need to provide a search all able to look in all xml fields.
Apparently Lucene (2.4.0) does not provide such a search all facility, and
I have to build a query with my search field associated to all available XML
elements.

Assuming that I am searching in a address book (fictive example for
illustration) which is made of contacts (my lucene documents) containing
several fields like name, address, city, ...
 
So my search for paul inside my addressbook will look like:
name:paul OR address:paul OR city:paul and so on... 

Lucene will then tell me which contacts match my query, but is there a way
to know which field(s) matched the request ?
The goal is to display the XML with the matching fields highlighted.

I did not found anything like this in Lucene, so I seems that the only way
is to perform a additional search field by field...

So if I have 100 fields per document (I told you my address book was a
fictive example, the XML I am working on are a little bit more complex), and
get 100 results that I want ot display in a list, this mean that I would
need to perform 1 additional searches request... 

Please tell me that there is a better way to do the job...
-- 
View this message in context: 
http://www.nabble.com/Can-Lucene-tells-which-field-matched---tp20357552p20357552.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

possible score value

2008-11-06 Thread Francisco Borges

Hello,

I have been going through the scoring documentation and code.

I had the expectation that Lucene would enforce a score value between [0,1].
But from what I can grasp from the code and docs, score values can be
greater than one.

Does Lucene considers score values greater than 1 as valid?

Kind regards,
-- 
Francisco

Re: possible score value

2008-11-06 Thread Anshum

Hi Fransisco,

Did you come across :
  scoreNorm = 1.0f / topDocs.getMaxScore();
or something of this sort in Hits?
As per my knowledge, the initial score is more than 1 but finally the scores
get divided by the maxScore of the matched doc set. i.e. Setting an upper
limit of 1 (for the max scorer).
Hope this clarifies things! :)

--
Anshum Gupta
Naukri Labs!
http://ai-cafe.blogspot.com

The facts expressed here belong to everybody, the opinions to me. The
distinction is yours to draw


On Thu, Nov 6, 2008 at 4:20 PM, Francisco Borges [EMAIL PROTECTED]
 wrote:

 Hello,

 I have been going through the scoring documentation and code.

 I had the expectation that Lucene would enforce a score value between
 [0,1].
 But from what I can grasp from the code and docs, score values can be
 greater than one.

 Does Lucene considers score values greater than 1 as valid?

 Kind regards,
 --
 Francisco

Re: BoostingTermQuery scoring

2008-11-06 Thread Grant Ingersoll

Not sure, but it sounds like you are interested in a higher level  
Query, kind of like the BooleanQuery, but then part of it sounds like  
it is per document, right?  Is it that you want to deal with multiple  
payloads in a document, or multiple BTQs in a bigger query?

On Nov 4, 2008, at 9:42 AM, Peter Keegan wrote:


I'm using BoostingTermQuery to boost the score of documents with terms
containing payloads (boost value  1). I'd like to change the scoring
behavior such that if a query contains multiple BoostingTermQuery  
terms
(either required or optional), documents containing more matching  
terms with
payloads always score higher than documents with fewer terms with  
payloads.
Currently, if one of the terms has a high IDF weight and contains a  
boosting
payload but no payloads on other matching terms, it may score higher  
than

docs with other matching terms with payloads and lower IDF.

I think what I need is a way to increase the weight of a matching  
term in
BoostingSpanScorer.score() if 'payloadsSeen  0', but I don't see  
how to do

this. Any suggestions?

Thanks,
Peter


--
Grant Ingersoll


Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ










-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Can Lucene tells which field matched ?

2008-11-06 Thread Daan de Wit

Hi,

I have implemented such a solution using the query explanation.
IndexSearcher has an explain(Query query, int document) method that
returns an Explanation object, on the Explanation object you can ask if
it is a match with #isMatch(). You still need to repeat this for each
found document though.

Daan

 -Original Message-
 From: Dora [mailto:[EMAIL PROTECTED]
 Sent: donderdag 6 november 2008 10:19
 To: java-user@lucene.apache.org
 Subject: Can Lucene tells which field matched ?
 
 
 Hi
 
 I am new to Lucene and working on a search module for some XML data:
 
 I need to provide a search all able to look in all xml fields.
 Apparently Lucene (2.4.0) does not provide such a search all
facility,
 and
 I have to build a query with my search field associated to all
available
 XML
 elements.
 
 Assuming that I am searching in a address book (fictive example for
 illustration) which is made of contacts (my lucene documents)
containing
 several fields like name, address, city, ...
 
 So my search for paul inside my addressbook will look like:
 name:paul OR address:paul OR city:paul and so on...
 
 Lucene will then tell me which contacts match my query, but is there a
way
 to know which field(s) matched the request ?
 The goal is to display the XML with the matching fields highlighted.
 
 I did not found anything like this in Lucene, so I seems that the only
way
 is to perform a additional search field by field...
 
 So if I have 100 fields per document (I told you my address book was a
 fictive example, the XML I am working on are a little bit more
complex),
 and
 get 100 results that I want ot display in a list, this mean that I
would
 need to perform 1 additional searches request...
 
 Please tell me that there is a better way to do the job...
 --
 View this message in context: http://www.nabble.com/Can-Lucene-tells-
 which-field-matched---tp20357552p20357552.html
 Sent from the Lucene - Java Users mailing list archive at Nabble.com.
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Can Lucene tells which field matched ?

2008-11-06 Thread Ulrich Vachon

Hi Daan,

Can we have an exemple of your implementation?

Thx
Ulrich VACHON 

-Message d'origine-
De : Daan de Wit [mailto:[EMAIL PROTECTED] 
Envoyé : jeudi 6 novembre 2008 11:35
À : java-user@lucene.apache.org
Objet : RE: Can Lucene tells which field matched ?

Hi,

I have implemented such a solution using the query explanation.
IndexSearcher has an explain(Query query, int document) method that returns an 
Explanation object, on the Explanation object you can ask if it is a match with 
#isMatch(). You still need to repeat this for each found document though.

Daan

 -Original Message-
 From: Dora [mailto:[EMAIL PROTECTED]
 Sent: donderdag 6 november 2008 10:19
 To: java-user@lucene.apache.org
 Subject: Can Lucene tells which field matched ?
 
 
 Hi
 
 I am new to Lucene and working on a search module for some XML data:
 
 I need to provide a search all able to look in all xml fields.
 Apparently Lucene (2.4.0) does not provide such a search all
facility,
 and
 I have to build a query with my search field associated to all
available
 XML
 elements.
 
 Assuming that I am searching in a address book (fictive example for
 illustration) which is made of contacts (my lucene documents)
containing
 several fields like name, address, city, ...
 
 So my search for paul inside my addressbook will look like:
 name:paul OR address:paul OR city:paul and so on...
 
 Lucene will then tell me which contacts match my query, but is there a
way
 to know which field(s) matched the request ?
 The goal is to display the XML with the matching fields highlighted.
 
 I did not found anything like this in Lucene, so I seems that the only
way
 is to perform a additional search field by field...
 
 So if I have 100 fields per document (I told you my address book was a 
 fictive example, the XML I am working on are a little bit more
complex),
 and
 get 100 results that I want ot display in a list, this mean that I
would
 need to perform 1 additional searches request...
 
 Please tell me that there is a better way to do the job...
 --
 View this message in context: http://www.nabble.com/Can-Lucene-tells-
 which-field-matched---tp20357552p20357552.html
 Sent from the Lucene - Java Users mailing list archive at Nabble.com.
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


__
Cet e-mail a été scanné par MessageLabs Email Security System.
Pour plus d'informations, visitez http://www.messagelabs.com/email 
__

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

What does Sort.RELEVANCE do?

2008-11-06 Thread Teruhiko Kurosaka

I can specify Sort.RELEVANCE to Searcher.search as in:

hits = searcher.search(q, Sort.RELEVANCE); // Using deprecated method to
make it short

What is the real effect of specifying the Sort argument like this?

Does Sort.RELEVANCE sorts the hits in order of the score
shown in Sect. 3.3 Understanding Lucene scoring
of Lucene In Action? If I use the search method without
a sort argument, is it equivalent of specifying
Sort.INDEXORDER?


T. Kuro Kurosaka, Basis Technology
San Francisco, California, U.S.A.
 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Global Field question (thread-safe)?

2008-11-06 Thread Glen Newton

I have a use case where I want all of my documents to have - in
addition to their other fields - a  single field=value.
An example use is where I have multiple Lucene indexes that I search
in parallel, but still need to distinguish them.
Index 1: All documents have: source=a1
Index 2: All documents have: source=a2

This is a common use case that has previously been discussed on this list.

The particular question I have is: when I am indexing, can I create a
single Field and use it for all Documents? Note I am in a
multithreaded environment, so many Documents are created and will have
this same Field added to them, and subsequently indexed.

So are their any threading issues with this particular usage?

thanks,

Glen

-- 

-

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: BoostingTermQuery scoring

2008-11-06 Thread Peter Keegan

Let me give some background on the problem behind my question.

Our index contains many fields (title, body, date, city, etc). Most queries
search all fields, but for best performance, we create an additional
'contents' field that contains all terms from all fields so that only one
field needs to be searched. Some fields, like title and city, are boosted by
a factor of 5. In order to make term boosting work, we create an additional
field 'boost' that contains all the terms from the boosted fields (title,
city).

Then, at search time, a query for petroleum engineer gets rewritten to:
(+contents:petroleum +contents:engineer) (+boost:petroleum +boost:engineer).
Note that the two clauses are OR'd so that a term that exists in both fields
will get a higher weight in the 'boost' field. This works quite well at
boosting documents with terms that exist in the boosted fields. However, it
doesn't work properly if excluded terms are added, for example:

(+contents:petroleum +contents:engineer -contents:drilling)
(+boost:petroleum +boost:engineer -boost:drilling)

If a document contains the term 'drilling' in the 'body' field, but not in
the 'title' or 'city' field, a false hit occurs.

Enter payloads and 'BoostingTermQuery'. At indexing time, as terms are added
to the 'contents' field, they are assigned a payload (value=5) if the term
also exists in one of the boosted fields. The 'scorePayload' method in our
Similarity class returns the payload value as a score. The query no longer
contains the 'boost' fields and is simply:

+contents:petroleum +contents:engineer -contents:drilling

The goal is to make the payload technique behavior similar to the 'boost'
field technique. The problem is that relevance scores of the top hits are
sometimes quite different. The reason is that the IDF values for a given
term in the 'boost' field is often much higher than the same term in the
'contents' field. This makes sense because the 'boost' field contains a
fairly small subset of the 'contents' field. Even with a payload of '5', a
low IDF in the 'contents' field usually erases the effect of the payload.

I have found a fairly simple (albeit inelegant) solution that seems to work.
The 'boost' field is still created as before, but it is only used to compute
IDF values for the weight class 'BoostingTermQuery.BoostingTermWeight. I had
to make this class 'public' so that I could override the IDF value as
follows:

public class MNSBoostingTermQuery extends BoostingTermQuery {
  public MNSBoostingTermQuery(Term term) {
super(term);
  }
  protected class MNSBoostingTermWeight extends
BoostingTermQuery.BoostingTermWeight {
public MNSBoostingTermWeight(BoostingTermQuery query, Searcher searcher)
throws IOException {
  super(query, searcher);
  java.util.HashSetTerm newTerms = new java.util.HashSetTerm();
  // Recompute IDF based on 'boost' field
  Iterator i = terms.iterator();
  Term term=null;
  while (i.hasNext()) {
term = (Term)i.next();
newTerms.add(new Term(boost, term.text()));
  }
  this.idf = this.query.getSimilarity(searcher).idf(newTerms, searcher);
}
  }
}

Any thoughts about a better implementation are welcome.

Peter




On Thu, Nov 6, 2008 at 8:00 AM, Grant Ingersoll [EMAIL PROTECTED] wrote:

 Not sure, but it sounds like you are interested in a higher level Query,
 kind of like the BooleanQuery, but then part of it sounds like it is per
 document, right?  Is it that you want to deal with multiple payloads in a
 document, or multiple BTQs in a bigger query?

 On Nov 4, 2008, at 9:42 AM, Peter Keegan wrote:

  I'm using BoostingTermQuery to boost the score of documents with terms
 containing payloads (boost value  1). I'd like to change the scoring
 behavior such that if a query contains multiple BoostingTermQuery terms
 (either required or optional), documents containing more matching terms
 with
 payloads always score higher than documents with fewer terms with
 payloads.
 Currently, if one of the terms has a high IDF weight and contains a
 boosting
 payload but no payloads on other matching terms, it may score higher than
 docs with other matching terms with payloads and lower IDF.

 I think what I need is a way to increase the weight of a matching term in
 BoostingSpanScorer.score() if 'payloadsSeen  0', but I don't see how to
 do
 this. Any suggestions?

 Thanks,
 Peter


 --
 Grant Ingersoll


 Lucene Helpful Hints:
 http://wiki.apache.org/lucene-java/BasicsOfPerformance
 http://wiki.apache.org/lucene-java/LuceneFAQ










 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]

Re: What does Sort.RELEVANCE do?

2008-11-06 Thread Michael McCandless



Section 5.1.2 of LIA also explains this.

Sort.RELEVANCE sorts by relevance score, descending, breaking ties by  
sorting by doc ID, ascending, and s the default if you don't specify a  
sort order.


Sort.INDEXORDER sorts only by doc ID, which is not the default sort.

Mike

Teruhiko Kurosaka wrote:


I can specify Sort.RELEVANCE to Searcher.search as in:

hits = searcher.search(q, Sort.RELEVANCE); // Using deprecated  
method to

make it short

What is the real effect of specifying the Sort argument like this?

Does Sort.RELEVANCE sorts the hits in order of the score
shown in Sect. 3.3 Understanding Lucene scoring
of Lucene In Action? If I use the search method without
a sort argument, is it equivalent of specifying
Sort.INDEXORDER?


T. Kuro Kurosaka, Basis Technology
San Francisco, California, U.S.A.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Global Field question (thread-safe)?

2008-11-06 Thread Glen Newton

Thanks!  :-)

2008/11/6 Michael McCandless [EMAIL PROTECTED]:

 The field never changes across all docs?  If so, this will work fine.

 Mike

 Glen Newton wrote:

 I have a use case where I want all of my documents to have - in
 addition to their other fields - a  single field=value.
 An example use is where I have multiple Lucene indexes that I search
 in parallel, but still need to distinguish them.
 Index 1: All documents have: source=a1
 Index 2: All documents have: source=a2

 This is a common use case that has previously been discussed on this list.

 The particular question I have is: when I am indexing, can I create a
 single Field and use it for all Documents? Note I am in a
 multithreaded environment, so many Documents are created and will have
 this same Field added to them, and subsequently indexed.

 So are their any threading issues with this particular usage?

 thanks,

 Glen

 --

 -

 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]



 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]





-- 

-

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: BoostingTermQuery scoring

2008-11-06 Thread Peter Keegan

I've discovered another flaw in using this technique:

(+contents:petroleum +contents:engineer +contents:refinery)
(+boost:petroleum +boost:engineer +boost:refinery)

It's possible that the first clause will produce a matching doc and none of
the terms in the second clause are used to score that doc. Yet another
reason to use BoostingTermQuery.

Peter


On Thu, Nov 6, 2008 at 1:08 PM, Peter Keegan [EMAIL PROTECTED] wrote:

 Let me give some background on the problem behind my question.

 Our index contains many fields (title, body, date, city, etc). Most queries
 search all fields, but for best performance, we create an additional
 'contents' field that contains all terms from all fields so that only one
 field needs to be searched. Some fields, like title and city, are boosted by
 a factor of 5. In order to make term boosting work, we create an additional
 field 'boost' that contains all the terms from the boosted fields (title,
 city).

 Then, at search time, a query for petroleum engineer gets rewritten to:
 (+contents:petroleum +contents:engineer) (+boost:petroleum +boost:engineer).
 Note that the two clauses are OR'd so that a term that exists in both fields
 will get a higher weight in the 'boost' field. This works quite well at
 boosting documents with terms that exist in the boosted fields. However, it
 doesn't work properly if excluded terms are added, for example:

 (+contents:petroleum +contents:engineer -contents:drilling)
 (+boost:petroleum +boost:engineer -boost:drilling)

 If a document contains the term 'drilling' in the 'body' field, but not in
 the 'title' or 'city' field, a false hit occurs.

 Enter payloads and 'BoostingTermQuery'. At indexing time, as terms are
 added to the 'contents' field, they are assigned a payload (value=5) if the
 term also exists in one of the boosted fields. The 'scorePayload' method in
 our Similarity class returns the payload value as a score. The query no
 longer contains the 'boost' fields and is simply:

 +contents:petroleum +contents:engineer -contents:drilling

 The goal is to make the payload technique behavior similar to the 'boost'
 field technique. The problem is that relevance scores of the top hits are
 sometimes quite different. The reason is that the IDF values for a given
 term in the 'boost' field is often much higher than the same term in the
 'contents' field. This makes sense because the 'boost' field contains a
 fairly small subset of the 'contents' field. Even with a payload of '5', a
 low IDF in the 'contents' field usually erases the effect of the payload.

 I have found a fairly simple (albeit inelegant) solution that seems to
 work. The 'boost' field is still created as before, but it is only used to
 compute IDF values for the weight class
 'BoostingTermQuery.BoostingTermWeight. I had to make this class 'public' so
 that I could override the IDF value as follows:

 public class MNSBoostingTermQuery extends BoostingTermQuery {
   public MNSBoostingTermQuery(Term term) {
 super(term);
   }
   protected class MNSBoostingTermWeight extends
 BoostingTermQuery.BoostingTermWeight {
 public MNSBoostingTermWeight(BoostingTermQuery query, Searcher
 searcher) throws IOException {
   super(query, searcher);
   java.util.HashSetTerm newTerms = new java.util.HashSetTerm();
   // Recompute IDF based on 'boost' field
   Iterator i = terms.iterator();
   Term term=null;
   while (i.hasNext()) {
 term = (Term)i.next();
 newTerms.add(new Term(boost, term.text()));
   }
   this.idf = this.query.getSimilarity(searcher).idf(newTerms,
 searcher);
 }
   }
 }

 Any thoughts about a better implementation are welcome.

 Peter





 On Thu, Nov 6, 2008 at 8:00 AM, Grant Ingersoll [EMAIL PROTECTED]wrote:

 Not sure, but it sounds like you are interested in a higher level Query,
 kind of like the BooleanQuery, but then part of it sounds like it is per
 document, right?  Is it that you want to deal with multiple payloads in a
 document, or multiple BTQs in a bigger query?

 On Nov 4, 2008, at 9:42 AM, Peter Keegan wrote:

  I'm using BoostingTermQuery to boost the score of documents with terms
 containing payloads (boost value  1). I'd like to change the scoring
 behavior such that if a query contains multiple BoostingTermQuery terms
 (either required or optional), documents containing more matching terms
 with
 payloads always score higher than documents with fewer terms with
 payloads.
 Currently, if one of the terms has a high IDF weight and contains a
 boosting
 payload but no payloads on other matching terms, it may score higher than
 docs with other matching terms with payloads and lower IDF.

 I think what I need is a way to increase the weight of a matching term in
 BoostingSpanScorer.score() if 'payloadsSeen  0', but I don't see how to
 do
 this. Any suggestions?

 Thanks,
 Peter


 --
 Grant Ingersoll


 Lucene Helpful Hints:

RE: BoostingTermQuery scoring

2008-11-06 Thread Steven A Rowe

Hi Peter,

On 11/06/2008 at 4:25 PM, Peter Keegan wrote:
 I've discovered another flaw in using this technique:
 
 (+contents:petroleum +contents:engineer +contents:refinery)
 (+boost:petroleum +boost:engineer +boost:refinery)
 
 It's possible that the first clause will produce a matching
 doc and none of the terms in the second clause are used to
 score that doc. Yet another reason to use BoostingTermQuery.

I think you could address this, without BTQ, using something like:

  boost:(+petroleum +engineer +refinery)
  (+contents:(+petroleum +engineer +refinery)
   +((*:* -boost:petroleum)
 (*:* -boost:engineer)
 (*:* -boost:refinery)))

The last three lines gives you the set of documents that are missing at least 
one of the terms in the boost field.  The *:* thingy, indicating a 
MatchAllDocsQuery, is necessary to get all documents that don't have a given 
term; Lucene's (sub-)query document exclusion operation needs a non-empty set 
on which to operate.

On 11/06/2008 at 1:08 PM, Peter Keegan wrote:
 Then, at search time, a query for petroleum engineer gets rewritten
 to: (+contents:petroleum +contents:engineer) (+boost:petroleum
 +boost:engineer). Note that the two clauses are OR'd so that a term that
 exists in both fields will get a higher weight in the 'boost' field.
 This works quite well at boosting documents with terms that exist in the
 boosted fields. However, it doesn't work properly if excluded terms are
 added, for example:
 
 (+contents:petroleum +contents:engineer -contents:drilling)
 (+boost:petroleum +boost:engineer -boost:drilling)
 
 If a document contains the term 'drilling' in the 'body'
 field, but not in the 'title' or 'city' field, a false hit occurs.

I think you could address this problem like this:

  +(boost:(+petroleum +engineer)
(+contents:(+petroleum +engineer)
 +((*:* -boost:petroleum)
   (*:* -boost:engineer
  -contents:drilling

You don't have to include -boost:drilling, because this condition is entailed 
by -contents:drilling.

Steve

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Boosting results

2008-11-06 Thread Scott Smith

I'm interested in comments on the following problem.  

 

I have a set of documents.  They fall into 3 categories.  Call these
categories A, B, and C.  Each document has an indexed, non-tokenized
field called category which contains A, B, or C (they are mutually
exclusive categories).  

 

All of the documents contain a field called body which contains a
bunch of text.  This field is indexed and tokenized.

 

So, I want to do a search which looks something like:

 

(category:A OR category:B) AND body:fred

 

I want all of the category A documents to come before the category B
documents.  Effectively, I want to have the category A documents first
(sorted by relevancy) and then the category B documents after (sorted by
relevancy).

 

I thought I could do this by boosting the category portion of the query,
but that doesn't seem to work consistently.  I was setting the boost on
the category A term to 1.0 and the boost on the category B term to 0.0.

 

Any thoughts how to skin this?

 

Scott

Re: Boosting results

2008-11-06 Thread Erick Erickson

It seems to me that the easiest thing would be to fire two queries and
then just concatenate the results

category:A AND body:fred

category:B AND body:fred


If you really, really didn't want to fire two queries, you could create
filters on category A and category B and make a couple of
passes through your results seeing if the returned documents were in
the filter, but you'd still concatenate the results. Actually in your
specific example you could make one filter on A.

You could also consider a custom scorer that, added 1,000,000 to every
category A document.

How much were you boosting by? What happens if you boost by a very large
factor?
As in ridiculously large?

Best
Erick

On Thu, Nov 6, 2008 at 7:42 PM, Scott Smith [EMAIL PROTECTED]wrote:

 I'm interested in comments on the following problem.



 I have a set of documents.  They fall into 3 categories.  Call these
 categories A, B, and C.  Each document has an indexed, non-tokenized
 field called category which contains A, B, or C (they are mutually
 exclusive categories).



 All of the documents contain a field called body which contains a
 bunch of text.  This field is indexed and tokenized.



 So, I want to do a search which looks something like:



 (category:A OR category:B) AND body:fred



 I want all of the category A documents to come before the category B
 documents.  Effectively, I want to have the category A documents first
 (sorted by relevancy) and then the category B documents after (sorted by
 relevancy).



 I thought I could do this by boosting the category portion of the query,
 but that doesn't seem to work consistently.  I was setting the boost on
 the category A term to 1.0 and the boost on the category B term to 0.0.



 Any thoughts how to skin this?



 Scott

Can Lucene tells which field matched ?

possible score value

Re: possible score value

Re: BoostingTermQuery scoring

RE: Can Lucene tells which field matched ?

RE: Can Lucene tells which field matched ?

What does Sort.RELEVANCE do?

Global Field question (thread-safe)?

Re: BoostingTermQuery scoring

Re: What does Sort.RELEVANCE do?

Re: Global Field question (thread-safe)?

Re: BoostingTermQuery scoring

RE: BoostingTermQuery scoring

Boosting results

Re: Boosting results

15 matches

Site Navigation

Mail list logo

Footer information