Re: Is RangeQuery more efficient than DateFilter?

2004-03-29 Thread Erik Hatcher
On Mar 29, 2004, at 4:25 AM, Kevin A. Burton wrote:
I have a 7G index.  A query for a random term comes back fast (300ms) 
when I'm not using a DateFilter but when I add the DateFilter it takes 
2.6 seconds.  Way too long.  I assume this is because the filter API 
does a post process so it has to read fields off disk.

Is it possible to do to this with a RangeQuery.  For example you could 
create a days since January 1, 1970 fields and do a range query from 
between 5 and 10... and then add the original field as well.
Are you keeping DateFilter around for more than one search?  The 
drawback to pure DateFilter is that it does not cache, so each search 
re-enumerates the terms in the range.  In fact, DateFilter by itself is 
practically of no use, I think.

If you have a set of canned date ranges, there are two approaches worth 
considering:  DateFilter wrapped by a CachingWrappingFilter, or a 
RangeQuery wrapped in a QueryFilter (which does cache).

Performance-wise, I don't really think there is much (any?) difference 
in these two approaches, so take your pick.  Once the bit sets are 
cached in a filter, searches will be quite fast.

	Erik

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Is RangeQuery more efficient than DateFilter?

2004-03-29 Thread Erik Hatcher
On Mar 29, 2004, at 8:41 AM, Erik Hatcher wrote:
On Mar 29, 2004, at 4:25 AM, Kevin A. Burton wrote:
I have a 7G index.  A query for a random term comes back fast (300ms) 
when I'm not using a DateFilter but when I add the DateFilter it 
takes 2.6 seconds.  Way too long.  I assume this is because the 
filter API does a post process so it has to read fields off disk.

Is it possible to do to this with a RangeQuery.  For example you 
could create a days since January 1, 1970 fields and do a range 
query from between 5 and 10... and then add the original field as 
well.
Are you keeping DateFilter around for more than one search?  The 
drawback to pure DateFilter is that it does not cache, so each search 
re-enumerates the terms in the range.  In fact, DateFilter by itself 
is practically of no use, I think.

If you have a set of canned date ranges, there are two approaches 
worth considering:  DateFilter wrapped by a CachingWrappingFilter, or 
a RangeQuery wrapped in a QueryFilter (which does cache).

Performance-wise, I don't really think there is much (any?) difference 
in these two approaches, so take your pick.  Once the bit sets are 
cached in a filter, searches will be quite fast.
One more point... caching is done by the IndexReader used for the 
search, so you will need to keep that instance (i.e. the IndexSearcher) 
around to benefit from the caching.

	Erik

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Is RangeQuery more efficient than DateFilter?

2004-03-29 Thread Kevin A. Burton
Erik Hatcher wrote:

One more point... caching is done by the IndexReader used for the 
search, so you will need to keep that instance (i.e. the 
IndexSearcher) around to benefit from the caching.

Great... Damn... looked at the source of CachingWrapperFilter and it 
makes sense.  Thanks for the pointer.  The results were pretty amazing.  
Here are the results before and after. Times are in millis:

Before caching the Field:

Searching for Jakarta:
2238
1910
1899
1901
1904
1906
After caching the field:
2253
10
6
8
6
6
That's a HUGE difference :)

I'm very happy :)

--

Please reply using PGP.

   http://peerfear.org/pubkey.asc
   
   NewsMonster - http://www.newsmonster.org/
   
Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965
  AIM/YIM - sfburtonator,  Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
 IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster




signature.asc
Description: OpenPGP digital signature


Re: Is RangeQuery more efficient than DateFilter?

2004-03-29 Thread Stephane James Vaucher
I've added some information contained on this thread on the wiki.

http://wiki.apache.org/jakarta-lucene/DateRangeQueries

If you wish to add more information, go right ahead, but since I added
this info, I believe it's ultimately my responsibility to maintain it.

sv

On Mon, 29 Mar 2004, Kevin A. Burton wrote:

 Erik Hatcher wrote:

 
  One more point... caching is done by the IndexReader used for the
  search, so you will need to keep that instance (i.e. the
  IndexSearcher) around to benefit from the caching.
 
 Great... Damn... looked at the source of CachingWrapperFilter and it
 makes sense.  Thanks for the pointer.  The results were pretty amazing.
 Here are the results before and after. Times are in millis:

 Before caching the Field:

 Searching for Jakarta:
 2238
 1910
 1899
 1901
 1904
 1906

 After caching the field:
 2253
 10
 6
 8
 6
 6

 That's a HUGE difference :)

 I'm very happy :)




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]