Can you please try to develop a reproducible test case? Otherwise it's impossible to verify and debug this.
For something like this it would suffice to provide:
1. The initial index, which satisifies the test queries;
2. The new index you add;
3. Your merge and test code, as a single class that illustrates the problem.
The smaller the indexes the better: not only will it be easier to transfer them, but debugging will be faster.
Also, you should add a bug to track this, at:
http://issues.apache.org/bugzilla/enter_bug.cgi?product=Lucene
Doug
Terry Steichen wrote:
I did some more checking and uncovered what appears to be a serious Lucene problem. (Either that or my merge code - below - is wrong.) Appreciate any help in figuring out what's wrong. Here are the facts as I see them:
1) I put together a large number of canned queries (some rather complex) for routine testing purposes. 2) I created a new compound file index and tested the queries. All worked fine. 3) I then indexed some new documents and merged the new index with the original index. 4) I then tried the queries again. Each time I did this, about 1-3% of the queries no longer worked - the actual number appears to vary with each merge. 5) The specific queries that fail change with each merge. Ones that failed after the previous merge almost always appear to work again with the next merge (which produces a new batch of failures). 6) In all cases I've so far examined, the offending part of the affected queries is a single quoted phrase (even though there may be several such phrases in the query) - remove it, and the (now modified) query works fine. 7) I tried the same thing using the original multi-file index format, with the same results. 8) About a week and a half ago, I migrated from 1.3final to the latest CVS head. 9) I've only just started checking this, so I don't know how long this behavior has been going on. The small percentage of errors and (apparent) randomness of which query is affected make it hard to detect. 10) I have about 32 fields per document, most of which are tokenized, indexed and stored. 11) My merge code (for the multi-file index format) is this:
import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.store.FSDirectory;
class MergeIndices { public static void main(String[] args) {
//args[0]: relative path to main index //args[1]: relative path to new index (to be merged with main)
try { IndexWriter writer = new IndexWriter(args[0], new StandardAnalyzer(), false); // writer.setUseCompoundFile(true); //used for compound format FSDirectory dir = FSDirectory.getDirectory(args[1], false); FSDirectory[] dirs = new FSDirectory[1]; dirs[0] = dir; writer.addIndexes(dirs); writer.optimize(); writer.close(); } catch (Exception e) { System.out.println(" caught a " + e.getClass() + "\n with message: " + e.getMessage()); } }
}
----- Original Message ----- From: "Terry Steichen" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Wednesday, March 31, 2004 11:47 AM Subject: Re: Wierd Search Behavior
No, they're typos in the e-mail. In the application, all the colons are properly placed. (Guess I was/am so frustrated I can't write right any more).
Terry
----- Original Message ----- From: "Erik Hatcher" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Wednesday, March 31, 2004 9:55 AM Subject: Re: Wierd Search Behavior
On Mar 31, 2004, at 9:49 AM, Terry Steichen wrote:\
I'm experiencing some very puzzling search behavior. I am using the CVS head I pulled about a week ago. I use the StandardAnalyzer and QueryParser. I have a collection of XML documents indexed. One field is "subhead", and here's what I find with different queries: subhead:(missile defense) - works fine subhead("missile" "defense") - works fine subhead("missile defense") - fails subhead(missile defense "missile defense") - fails subhead(missile defense "missile dork") - works fine subhead(missile defense "missile defens") - works fine (note misspelling)
I presume the missing colons on all but the first example is just a typo in your e-mail? If not, might that be the problem?
Erik
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
