Re: Performance of never optimizing

2008-11-04 Thread Toke Eskildsen
On Mon, 2008-11-03 at 23:37 +0100, Justus Pendleton wrote: What constitutes a proper warm up before measuring? The simplest way is to do a number of searches before you start measuring. The first searches are always very slow, compared to later searches. If you look at

AUTO: Zhou Lin Dai is out of the office. (returning 2008-11-10)

2008-11-04 Thread Zhou Lin Dai
I am out of the office until 2008-11-10.. Raja (He Kun Wang) will be my backup during my leave. I will check emails at night. For anything emergent, you can call my cell phone (86) 131 6290 0375. Note: This is an automated response to your message Can lucene search from multi-index directory

Re: Can lucene search from multi-index directory like using FK in database?

2008-11-04 Thread Erick Erickson
See below On Tue, Nov 4, 2008 at 7:31 AM, Clay Zhong [EMAIL PROTECTED] wrote: Hi Guys, I meet some problems when using Lucene 2.3.2. After a lot of research, I still can't find any ways to solve them. Hope you can give me some advice.. 1. Can I search different document from multi-index

Re: No segment files found/ Searcher error

2008-11-04 Thread Michael McCandless
Can you post some code of the merging process that adds documents to the current day's index? It is definitely spooky that CheckIndex reports it cannot find any segments file. The message in that exception should end with ...: files: XXX, ie, it says it could not find any segments_N

Re: No segment files found/ Searcher error

2008-11-04 Thread Michael McCandless
So it sounds like the Input/Output error was in fact because you were closing the IndexSearcher while an in-flight query was still using it? Or... are you still seeing that error now that you've switched to opening a new IndexSearcher for the current day for every query? It's very

Re: OutOfMemory Problems Lucene 2.4 / Tomcat

2008-11-04 Thread PabloS
Sure Todd, the idea basically consist in the following: - Subclassing FIeldSortedHitQueue and calling support with an empty SortField array: this disables caching because the comparators are retrieved during construction - Creating a new SortComparatorSource that creates the sort comparators

Re: Performance of never optimizing

2008-11-04 Thread Michael McCandless
If possible, you should try to use a larger corpus (eg Wikipedia) rather than multiply Reuters by N, which creates unnatural term frequency distribution. The graphs are hard to read because of the spline interpolation. Maybe you could overlay X's where there is a real datapoint? After

BoostingTermQuery scoring

2008-11-04 Thread Peter Keegan
I'm using BoostingTermQuery to boost the score of documents with terms containing payloads (boost value 1). I'd like to change the scoring behavior such that if a query contains multiple BoostingTermQuery terms (either required or optional), documents containing more matching terms with payloads

Re: No segment files found/ Searcher error

2008-11-04 Thread JulieSoko
Yes, I am leaving the searchers open for all indexes except for the current day. The index for the current day is constantly being updated and if I happen to have the Input/Output error/no segment files found error while searching the current day then that searcher will continue to return the

Re: No segment files found/ Searcher error

2008-11-04 Thread JulieSoko
I have seen the error all along... I've tried several different designs... This problem has always occurred on the current day where the index is constantly being merged. I am opening one searcher for up to 59 days and leaving them open but for the current day only, each user get's their own

Re: OutOfMemory Problems Lucene 2.4 / Tomcat

2008-11-04 Thread Todd Benge
Thanks Pablo. I'll be flying to New Orleans tomorrow for ApacheCon and would love the opportunity to talk with others about architectures others are using. Todd On 11/4/08, PabloS [EMAIL PROTECTED] wrote: Sure Todd, the idea basically consist in the following: - Subclassing

Re: OutOfMemory Problems Lucene 2.4 / Tomcat

2008-11-04 Thread Pablo Saavedra
I hope that helps, if you find anything interesting do post it somewhere. I'm afraid I'm a little bit far away from New Orleans at the moment. Regards. 2008/11/4 Todd Benge [EMAIL PROTECTED] Thanks Pablo. I'll be flying to New Orleans tomorrow for ApacheCon and would love the opportunity to

Re: Performance of never optimizing

2008-11-04 Thread Justus Pendleton
On 05/11/2008, at 4:36 AM, Michael McCandless wrote: If possible, you should try to use a larger corpus (eg Wikipedia) rather than multiply Reuters by N, which creates unnatural term frequency distribution. I'll replicate the tests with the wikipedia corpus over the next few days and