query parser

2006-03-08 Thread Raghavendra Prabhu
I want to use query parser to parse my query string

But the default field should be a group of fields with different fields
where it is searched on

Can any one let me know

For example if my query is

new books

new should be searched in different fields ( content and title)

books should be searched in different fields ( content and title)


How do i accomplish this and how can i extend querparser to do the above


Re: query parser

2006-03-08 Thread Rainer Dollinger
Take a look at the class MultiFieldQueryParser, I think it does exactly
what you want.

GR,
Rainer


Raghavendra Prabhu wrote:
 I want to use query parser to parse my query string
 
 But the default field should be a group of fields with different fields
 where it is searched on
 
 Can any one let me know
 
 For example if my query is
 
 new books
 
 new should be searched in different fields ( content and title)
 
 books should be searched in different fields ( content and title)
 
 
 How do i accomplish this and how can i extend querparser to do the above
 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Get only count

2006-03-08 Thread anton
Signifies this that method collect can be called for document with score =
0 ?

-Original Message-
From: Yonik Seeley [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, March 07, 2006 6:35 PM
To: java-user@lucene.apache.org
Subject: Re: Get only count
Importance: High

On 3/7/06, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
 Can have matching document score equals zero ?

Yes.  Scorers don't generally use score to determine if a document
matched the query.
Scores = 0.0f are currently screened out at the top level search
functions, but not when you use a HitCollector yourself.

-Yonik


 -Original Message-
 From: Yonik Seeley [mailto:[EMAIL PROTECTED]
 Sent: Tuesday, March 07, 2006 6:20 PM
 To: java-user@lucene.apache.org
 Subject: Re: Get only count
 Importance: High

 On 3/7/06, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
  While you added if (score  0.0f). Javadoc contain lines
  HitCollector.collect(int,float) is called for every non-zero scoring.

 That should probably read is called for every matching document.

 -Yonik

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: query parser

2006-03-08 Thread Raghavendra Prabhu
Hi Rainer

Thanks. I have one more doubt.

How do i set different boosts for each field using query parser

Can i set different boosts for each field?

Rgds
Prabhu

On 3/8/06, Rainer Dollinger [EMAIL PROTECTED] wrote:

 Take a look at the class MultiFieldQueryParser, I think it does exactly
 what you want.

 GR,
 Rainer


 Raghavendra Prabhu wrote:
  I want to use query parser to parse my query string
 
  But the default field should be a group of fields with different fields
  where it is searched on
 
  Can any one let me know
 
  For example if my query is
 
  new books
 
  new should be searched in different fields ( content and title)
 
  books should be searched in different fields ( content and title)
 
 
  How do i accomplish this and how can i extend querparser to do the above
 

 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]




Re: Lucene 1.9.1 and timeToString() apparent incompatibility with 1.4.3

2006-03-08 Thread Chris Hostetter

: Thanks Chris for making it clear, I had read the comment but I had not
: understood that it implied incompatibility. But will the code be preserved
: in Lucene 2.0, in light of the comment contained in the Lucene 1.9.1
: announcement ?

I don't really know, it's currently being discussed in LUCENE-500...

http://issues.apache.org/jira/browse/LUCENE-500


-Hoss


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Lucene 1.9.1 and timeToString() apparent incompatibility with 1.4.3

2006-03-08 Thread George Washington
thanks Chris, I think I'll opt for re-creating the index now, using the new 
1.9.1 code. Sooner  or later, it seems to me, the deprecated code will be 
removed anyway. Better facing the pain now than later, makes it possible for 
me to take advantage of the new date resolution features. Even though I can 
live without them they can be a performance boost.


Victor




From: Chris Hostetter [EMAIL PROTECTED]
Reply-To: java-user@lucene.apache.org
To: java-user@lucene.apache.org
Subject: Re: Lucene 1.9.1 and timeToString() apparent incompatibility with 
1.4.3

Date: Wed, 8 Mar 2006 01:03:43 -0800 (PST)


: Thanks Chris for making it clear, I had read the comment but I had not
: understood that it implied incompatibility. But will the code be 
preserved

: in Lucene 2.0, in light of the comment contained in the Lucene 1.9.1
: announcement ?

I don't really know, it's currently being discussed in LUCENE-500...

http://issues.apache.org/jira/browse/LUCENE-500


-Hoss


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



_
New year, new job – there's more than 100,00 jobs at SEEK 
http://a.ninemsn.com.au/b.aspx?URL=http%3A%2F%2Fninemsn%2Eseek%2Ecom%2Eau_t=752315885_r=Jan05_tagline_m=EXT



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



MuliField Query Parser

2006-03-08 Thread Raghavendra Prabhu
Hi

I need different boosts for fields which we define in multifield query
parser

How can this be accomplished??


Rgds
Prabhu


Re: MuliField Query Parser

2006-03-08 Thread Rainer Dollinger
You could try to inherit from MultiFieldQueryParser:

public class BoostableMultiFieldQueryParser extends MultiFieldQueryParser {

// TODO: add constructors of super class


public static Query parse(String query, String[] fields,
BooleanClause.Occur[] flags,Analyzer analyzer, float[]
boosts) throws ParseException {
  if (fields.length != flags.length)
throw new IllegalArgumentException(fields.length !=
flags.length);
  BooleanQuery bQuery = new BooleanQuery();
  for (int i = 0; i  fields.length; i++) {
QueryParser qp = new QueryParser(fields[i], analyzer);
Query q = qp.parse(query);

// ATTENTION: the only new line !!!
q.setBoost(boost[i]);

bQuery.add(q, flags[i]);
  }
  return bQuery;
}
}

I copied the code of method parse(String, String, BooleanClause.Occur[],
Analyzer) and added the parameter float[] boosts.
I marked the only line I have inserted.
You have to add the constructors from the super class to get the class
compiled.

I did'nt have the time to test this idea, please post a reply if it
works, if you try this.

Rainer



Raghavendra Prabhu wrote:
 Hi
 
 I need different boosts for fields which we define in multifield query
 parser
 
 How can this be accomplished??
 
 
 Rgds
 Prabhu
 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: RangeQuery and RangeFilter

2006-03-08 Thread mark harwood
See
http://wiki.apache.org/jakarta-lucene/FilteringOptions


--- Anton Potehin [EMAIL PROTECTED] wrote:

 What faster RangeQuery or RangeFilter ? 
 
 




___ 
Win a BlackBerry device from O2 with Yahoo!. Enter now. 
http://www.yahoo.co.uk/blackberry

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



1.4.3 and 64bit support? out of memory??

2006-03-08 Thread zzzzz shalev
hi all,
   
  i've been trying to load a 6GB index on linux (16GB RAM) but am having no 
success.
   
  i wrote a program that allocates memory and it was able to allocate as much 
RAM as i requested (stopped at 12GB)
   
  however 
   
  i am recieving the following stack trace:
   
  JVMDUMP013I Processed Dump Event uncaught, detail 
java/lang/OutOfMemoryError.
Exception in thread main java.lang.OutOfMemoryError
at 
org.apache.lucene.index.TermInfosReader.readIndex(TermInfosReader.java:82)
at 
org.apache.lucene.index.TermInfosReader.init(TermInfosReader.java:45)
at 
org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:112)
at org.apache.lucene.index.SegmentReader.init(SegmentReader.java:89)
at org.apache.lucene.index.IndexReader$1.doBody(IndexReader.java:118)
at org.apache.lucene.store.Lock$With.run(Lock.java:109)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:111)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:106)
at org.apache.lucene.search.IndexSearcher.init(IndexSearcher.java:43)

   
  when trying to load the indexes
   
  any ideas
   
  thanks in advance,
   
   
   


-
Yahoo! Mail
Bring photos to life! New PhotoMail  makes sharing a breeze. 

Re: Get only count

2006-03-08 Thread Paul Elschot
On Wednesday 08 March 2006 09:25, [EMAIL PROTECTED] wrote:
 Signifies this that method collect can be called for document with score =
 0 ?

The collect() method is called after next() on the top level Scorer has 
returned true. In between score() is called on that Scorer to provide the 
score value, but the score value is not tested.
Most Scorers give only positive score values for matching documents.

This is implemented in the IndexSearcher.search(...) and
Scorer.score(HitCollector) methods.

Regards,
Paul Elschot

 
 -Original Message-
 From: Yonik Seeley [mailto:[EMAIL PROTECTED] 
 Sent: Tuesday, March 07, 2006 6:35 PM
 To: java-user@lucene.apache.org
 Subject: Re: Get only count
 Importance: High
 
 On 3/7/06, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
  Can have matching document score equals zero ?
 
 Yes.  Scorers don't generally use score to determine if a document
 matched the query.
 Scores = 0.0f are currently screened out at the top level search
 functions, but not when you use a HitCollector yourself.
 
 -Yonik
 
 
  -Original Message-
  From: Yonik Seeley [mailto:[EMAIL PROTECTED]
  Sent: Tuesday, March 07, 2006 6:20 PM
  To: java-user@lucene.apache.org
  Subject: Re: Get only count
  Importance: High
 
  On 3/7/06, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
   While you added if (score  0.0f). Javadoc contain lines
   HitCollector.collect(int,float) is called for every non-zero scoring.
 
  That should probably read is called for every matching document.
 
  -Yonik
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 
 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: 1.4.3 and 64bit support? out of memory??

2006-03-08 Thread Dan Armbrust

z shalev wrote:

hi all,
   
  i've been trying to load a 6GB index on linux (16GB RAM) but am having no success.
   
  i wrote a program that allocates memory and it was able to allocate as much RAM as i requested (stopped at 12GB)
   


Was your program that got up to 12GB of memory written in Java, and 
using the same jvm with the same -Xmx settings as your lucene program?


Dan


--

Daniel Armbrust
Biomedical Informatics
Mayo Clinic Rochester
daniel.armbrust(at)mayo.edu
http://informatics.mayo.edu/

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Lucene Scoring

2006-03-08 Thread Pasha Bizhan
Hi, 

 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
 
 Anyone have a doc or something that would allow me to explain 
 this to execs? A Lucene Scoring for Dummies 
 idea...explaining math algo to a exec or someone with no 
 knowledge is not that easy :)

http://lucene.apache.org/java/docs/api/org/apache/lucene/search/Similarity.h
tml

And Lucene Book: - 3.3 : Understanding Lucene scoring 
http://lucenebook.com/search?query=scoring

Pasha Bizhan


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: 1.4.3 and 64bit support? out of memory??

2006-03-08 Thread zzzzz shalev
yes,
  100%
  

Dan Armbrust [EMAIL PROTECTED] wrote:
  z shalev wrote:
 hi all,
 
 i've been trying to load a 6GB index on linux (16GB RAM) but am having no 
 success.
 
 i wrote a program that allocates memory and it was able to allocate as much 
 RAM as i requested (stopped at 12GB)
 

Was your program that got up to 12GB of memory written in Java, and 
using the same jvm with the same -Xmx settings as your lucene program?

Dan


-- 

Daniel Armbrust
Biomedical Informatics
Mayo Clinic Rochester
daniel.armbrust(at)mayo.edu
http://informatics.mayo.edu/

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
Yahoo! Mail
Bring photos to life! New PhotoMail  makes sharing a breeze. 

Does Lucene support on-disk search?

2006-03-08 Thread Xiaocheng Luan
Hi,
   
  I heard that Lucene loads the index into memory to do a search, which does 
not sound quite right to me. I will not be surprised if Lucene is smart enough 
to load
  the index into memory when it is feasible, but I'd be surprised if it ALWAYS 
loads index memory to do the search, which I think would have scalability 
problem.
   
  Could someone clarify on this, thanks!
   
  By the way, could someone please share some experience on the performance of 
Lucene, say, on a data set of a few gigabytes and a reasonable query, what 
would
  be the average search time?
   
  Xiaocheng


-
 Yahoo! Mail
 Use Photomail to share photos without annoying attachments.

Re: Lucene Scoring

2006-03-08 Thread markharw00d

[EMAIL PROTECTED] wrote:


Anyone have a doc or something that would allow me to explain this to execs?


Roughly speaking:

* Documents containing *all* the search terms are good
* Matches on rare words are better than for common words
* Long documents are not as good as short ones
* Documents which mention the search terms many times are good

...although there are more factors you can choose to add,  like 
emphasising individual query terms or individual docs in the index.


Cheers
Mark










___ 
To help you stay safe and secure online, we've developed the all new Yahoo! Security Centre. http://uk.security.yahoo.com


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Does Lucene support on-disk search?

2006-03-08 Thread Grant Ingersoll
Lucene _can_ load the index into memory, but it doesn't have to, if you 
want further details see the Javadocs on RAMDirectory versus 
FSDirectory.  I think you will find it has good performance on a few 
gigs of data.  Results, of course, vary based on what you are asking it 
to do and what kind of hardware you have.


-Grant

Xiaocheng Luan wrote:

Hi,
   
  I heard that Lucene loads the index into memory to do a search, which does not sound quite right to me. I will not be surprised if Lucene is smart enough to load

  the index into memory when it is feasible, but I'd be surprised if it ALWAYS 
loads index memory to do the search, which I think would have scalability 
problem.
   
  Could someone clarify on this, thanks!
   
  By the way, could someone please share some experience on the performance of Lucene, say, on a data set of a few gigabytes and a reasonable query, what would

  be the average search time?
   
  Xiaocheng



-
 Yahoo! Mail
 Use Photomail to share photos without annoying attachments.
  


--
--- 
Grant Ingersoll 
Sr. Software Engineer 
Center for Natural Language Processing 
Syracuse University 
School of Information Studies 
335 Hinds Hall 
Syracuse, NY 13244 

http://www.cnlp.org 
Voice:  315-443-5484 
Fax: 315-443-6886 



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: 1.4.3 and 64bit support? out of memory??

2006-03-08 Thread Chris Hostetter

:   i am recieving the following stack trace:
:
:   JVMDUMP013I Processed Dump Event uncaught, detail 
java/lang/OutOfMemoryError.
: Exception in thread main java.lang.OutOfMemoryError
: at 
org.apache.lucene.index.TermInfosReader.readIndex(TermInfosReader.java:82)

is it possible that parts of your application are eating up all of the
heap in your JVM before this exception is encountered?  Possibly by
opening a the index many times without closing it?

More specifically, if you write a 4 line app that does nothing by open
your index and then close it again, do you get an OOM? ...

public class Main {
  public static void main(String[] args) throws Exception {
Searcher s = new IndexSearcher(/your/index/path);
s.close();
  }
}



-Hoss


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Lucene Scoring

2006-03-08 Thread Chris Hostetter

: Roughly speaking:
:
: * Documents containing *all* the search terms are good
: * Matches on rare words are better than for common words
: * Long documents are not as good as short ones
: * Documents which mention the search terms many times are good

Be wary of the distinction between term and word and how that affects
statements like Long documents are not as good as short ones ... If you
have a title field and body field and one document has a really long body,
but a very short title then a search on the title isn't going to be
penalized by the length of the body ... you have to choose your words
carefully.






-Hoss


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: 1.4.3 and 64bit support? out of memory??

2006-03-08 Thread zzzzz shalev
hey chris,
   
  i will check and let you know just to make sure,
   
  basically i see the OS allocating memory (up to about 4GB) while loading the 
indexes to memory and then crashing on the TermInfosReader class.  what i 
noticed was that the crash occured when lucene tried to create a Term array 
with the following code
   
  new Term[indexSize]
   
  i assume, since this is an array java was trying to allocate consecutive 
blocks in memory and this is hard to find , even in a 16 GB RAM machine, 
especially since (if im not mistaken) indexSize here is the termEnum size 
(which in my case is rather large)
   
  i will get back to you about the one liner, if you have any other thoughts id 
be extremely happy to hear them as this problem is a Major road block 
   
  thanks a million
   
  

Chris Hostetter [EMAIL PROTECTED] wrote:
  
: i am recieving the following stack trace:
:
: JVMDUMP013I Processed Dump Event uncaught, detail 
java/lang/OutOfMemoryError.
: Exception in thread main java.lang.OutOfMemoryError
: at org.apache.lucene.index.TermInfosReader.readIndex(TermInfosReader.java:82)

is it possible that parts of your application are eating up all of the
heap in your JVM before this exception is encountered? Possibly by
opening a the index many times without closing it?

More specifically, if you write a 4 line app that does nothing by open
your index and then close it again, do you get an OOM? ...

public class Main {
public static void main(String[] args) throws Exception {
Searcher s = new IndexSearcher(/your/index/path);
s.close();
}
}



-Hoss


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
Yahoo! Mail
Bring photos to life! New PhotoMail  makes sharing a breeze. 

Re: 1.4.3 and 64bit support? out of memory??

2006-03-08 Thread Daniel Noll

z shalev wrote:

hey chris,

i will check and let you know just to make sure,

basically i see the OS allocating memory (up to about 4GB) while
loading the indexes to memory and then crashing on the
TermInfosReader class.  what i noticed was that the crash occured
when lucene tried to create a Term array with the following code

new Term[indexSize]

i assume, since this is an array java was trying to allocate
consecutive blocks in memory and this is hard to find , even in a 16
GB RAM machine, especially since (if im not mistaken) indexSize here
is the termEnum size (which in my case is rather large)


That's not exactly how memory works.  When a program looks to allocate a 
chunk of memory, the chunk is allocated from the virtual memory space.


In the case of Windows XP on a 32-bit machine, the maximum contiguous 
virtual memory is somewhere just below 2GB in a best-case scenario 
(usually it's more like 1.5GB) regardless of the amount of physical RAM.


In the case of a 64-bit machine, though, the virtual memory space is 
much, much larger than your 16GB of RAM, so there should be no problem 
allocating ridiculous amounts of memory (or for that matter, memory 
mapping ridiculously large files to a byte buffer.)


It wasn't mentioned explicitly so it's probably worth checking... you 
are using the 64-bit JVM, right?  If you were still using the 32-bit 
JVM, that would certainly exhibit this sort of behaviour.


Daniel


--
Daniel Noll

Nuix Pty Ltd
Suite 79, 89 Jones St, Ultimo NSW 2007, AustraliaPh: +61 2 9280 0699
Web: http://www.nuix.com.au/Fax: +61 2 9212 6902

This message is intended only for the named recipient. If you are not
the intended recipient you are notified that disclosing, copying,
distributing or taking any action in reliance on the contents of this
message or attachment is strictly prohibited.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Lucene Ranking/scoring

2006-03-08 Thread Yang Sun
Hi,
Just wondering how I can rank search result by a combination of fields. I
know there is a multi-field sort, but it is just a sorting method. It is
sorted by the first field and then the second field ... 
What I need is a weighted combination. For example, I want to assign a
weight of 2 to title match, 1.5 to abstract match, and 3 to date match (i.e.
How close the last modified date). The final score will be
2*inTitle+1.5*inAbstract+3*date instead of sorting by date and then sorting
by title within the same date. 
I checked lucene Score, Similarity, and SortDocComparator and can't find an
answer. Implements the SortDocComparator seems the closest, but it can only
sort the result by one field. The Field boost does not work because the
boosting factor has to be set during index time. What I need is setting the
weight at query time.
Please help. Thanks.

Yang


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Lucene Ranking/scoring

2006-03-08 Thread Yonik Seeley
Hi Yang,

Boosting works at query time as well as index time.
If you are using the QueryParser, specify boosts like so:
title:foo^2 abstract:foo^1.5 date:mydate^3

If you are building queries pragmatically, then use the Query.setBoost() method.

That will boost relative to how a non-boosted query would score, but
keep in mind that you still have tf/idf factors in the score.  If you
need to get rid of the tf/idf factors, either write your own
ScoreDocComparator, or use a FunctionQuery.

-Yonik
http://incubator.apache.org/solr Solr, The Open Source Lucene Search Server


On 3/8/06, Yang Sun [EMAIL PROTECTED] wrote:
 Hi,
 Just wondering how I can rank search result by a combination of fields. I
 know there is a multi-field sort, but it is just a sorting method. It is
 sorted by the first field and then the second field ...
 What I need is a weighted combination. For example, I want to assign a
 weight of 2 to title match, 1.5 to abstract match, and 3 to date match (i.e.
 How close the last modified date). The final score will be
 2*inTitle+1.5*inAbstract+3*date instead of sorting by date and then sorting
 by title within the same date.
 I checked lucene Score, Similarity, and SortDocComparator and can't find an
 answer. Implements the SortDocComparator seems the closest, but it can only
 sort the result by one field. The Field boost does not work because the
 boosting factor has to be set during index time. What I need is setting the
 weight at query time.
 Please help. Thanks.

 Yang

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Atomic index/search for a phrase

2006-03-08 Thread Urvashi Gadi

Hi All,

I am trying index and search a phrase (multiple words seperated by 
spaces). How should i index it so that it remains atomic. I have 
observed that if i index the phrase are keyword, lucene doesn't let me 
retrive the phrase in search.


Please advice.

Urvashi




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Lucene Ranking/scoring

2006-03-08 Thread Yang Sun
Hi Yonik,
Thanks very much for your suggestion. The query boost works great for
keyword matching. But in my case, I need to rank the results by date and
title. For example, title:foo^2 abstract:foo^1.5 date:2004^3 will only boost
the document with date=2004. What I need is boosting the distance from the
specified date which means 2003 will have a better ranking than 2002,
20022001, etc. 
I implemented a customized ScoreDocComparator class which works fine for one
field. But I met some trouble when trying to combine other fields together.
I'm still looking at FunctionQuery. Don't know if I can figure out
something. 
Any suggestions? Thanks.

Yang


-Original Message-
From: Yonik Seeley [mailto:[EMAIL PROTECTED] 
Sent: 2006年3月8日 21:35
To: java-user@lucene.apache.org
Subject: Re: Lucene Ranking/scoring

Hi Yang,

Boosting works at query time as well as index time.
If you are using the QueryParser, specify boosts like so:
title:foo^2 abstract:foo^1.5 date:mydate^3

If you are building queries pragmatically, then use the Query.setBoost()
method.

That will boost relative to how a non-boosted query would score, but
keep in mind that you still have tf/idf factors in the score.  If you
need to get rid of the tf/idf factors, either write your own
ScoreDocComparator, or use a FunctionQuery.

-Yonik
http://incubator.apache.org/solr Solr, The Open Source Lucene Search Server


On 3/8/06, Yang Sun [EMAIL PROTECTED] wrote:
 Hi,
 Just wondering how I can rank search result by a combination of fields. I
 know there is a multi-field sort, but it is just a sorting method. It is
 sorted by the first field and then the second field ...
 What I need is a weighted combination. For example, I want to assign a
 weight of 2 to title match, 1.5 to abstract match, and 3 to date match (i.
e.
 How close the last modified date). The final score will be
 2*inTitle+1.5*inAbstract+3*date instead of sorting by date and then
sorting
 by title within the same date.
 I checked lucene Score, Similarity, and SortDocComparator and can't find
an
 answer. Implements the SortDocComparator seems the closest, but it can
only
 sort the result by one field. The Field boost does not work because the
 boosting factor has to be set during index time. What I need is setting
the
 weight at query time.
 Please help. Thanks.

 Yang

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RangeQuery, FilterdQuery and HitCollector

2006-03-08 Thread Youngho Cho
Hello,

I would like to use a Filter for rangeQuery ( to avoid potential TooManyClauses 
exception )
and found out 

http://wiki.apache.org/jakarta-lucene/FilteringOptions

wiki said that FilteredQuery is best one.
But Interesting is that 
when I used the option with HitCollector , FilteredQuery test is fail.

Am I something missing or FilteredQuery with HitCollector is forbid or a bug ?


Please refer to the my test code.

--
import junit.framework.TestCase;

import org.apache.lucene.analysis.cjk.CJKAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.NumberTools;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.Term;
import org.apache.lucene.search.Filter;
import org.apache.lucene.search.FilteredQuery;
import org.apache.lucene.search.HitCollector;
import org.apache.lucene.search.Hits;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.RangeFilter;
import org.apache.lucene.search.Searcher;
import org.apache.lucene.search.TermQuery;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.RAMDirectory;

import java.io.IOException;
import java.io.Serializable;

import java.util.Collection;
import java.util.HashSet;

public class FilteredRangeQueryTest extends TestCase
{
private Directory ramDir;

protected void setUp() throws Exception
{
ramDir = new RAMDirectory();
addDocuments();
}

public void testRangeQuery()
throws Exception
{
IndexSearcher searcher = new IndexSearcher(ramDir);

Filter filter = RangeFilter.Less(num, NumberTools.longToString(1L));

Term term = new Term(attid, NumberTools.longToString(113L));
Query query = new TermQuery(term);

Hits hits = searcher.search(query, filter);

assertEquals(0, hits.length());

HitCollector hitCollector = new TestHitCollector();

((TestHitCollector) hitCollector).setSearcher(searcher);

 This test is Pass   
searcher.search(query, filter, hitCollector);
assertEquals(0, ((TestHitCollector) hitCollector).getIds().size());
}

public void testFilteredQuery()
throws Exception
{
IndexSearcher searcher = new IndexSearcher(ramDir);

Filter filter = RangeFilter.Less(num, NumberTools.longToString(1L));

Term term = new Term(attid, NumberTools.longToString(113L));
Query query = new TermQuery(term);

FilteredQuery fq = new FilteredQuery(query, filter);

Hits hits = searcher.search(fq);

assertEquals(0, hits.length());

HitCollector hitCollector = new TestHitCollector();

((TestHitCollector) hitCollector).setSearcher(searcher);

// This test is FAIL  //
searcher.search(fq, hitCollector);
assertEquals(0, ((TestHitCollector) hitCollector).getIds().size());
}

private void addDocuments()
throws IOException
{
IndexWriter writer = new IndexWriter(ramDir, new CJKAnalyzer(), true);

Document doc = new Document();

doc.add(Field.Keyword(num, NumberTools.longToString(1000L)));
doc.add(Field.Keyword(attid, NumberTools.longToString(113L)));
doc.add(Field.Keyword(itid, 111));
writer.addDocument(doc);

writer.optimize();
writer.close();
}

public class TestHitCollector extends HitCollector implements Serializable
{
private transient Searcher searcher;
private transient Collection res;

public TestHitCollector()
{
}

public void setSearcher(Searcher searcher)
{
res = new HashSet();
this.searcher = searcher;
}

public void collect(int i, float v)
{
try
{
final Document doc = searcher.doc(i);

res.add(doc.get(itid));
}
catch (IOException e)
{
// ignored
}
}

public Collection getIds()
{
return res;
}
}
}


Thanks,

Youngho