Thanks! Now that I think of it, I was searching in the documentation for
a method to reset the document 'd' to "empty" once it is indexed so that
it could be reused but I didn't find one and then the bug slipped
through. I was afraid that all these objects might not be garbage
collected in time. In a test much smaller than infinite:
for (i=0; i<=100000000; i++) {
Document d = new Document();
d.add(Field.Keyword("nr", Integer.toString(i)));
d.add(Field.Keyword("element","POST"));
writer.addDocument(d);
}
I got very soon java.lang.OutOfMemoryError but, by just forcing garbage
collection at the end of the cycle, the memory usage is now a very flat
line... Sorry for bothering you.
-----Original Message-----
From: Doug Cutting [mailto:[EMAIL PROTECTED]]
Sent: Friday, September 27, 2002 2:24 PM
To: Lucene Users List
Subject: Re: Problems with exact matces on non-tokenized fields...
lex Murzaku wrote:
> I was trying this as well but now I get something I can't understand:
> My query (Query: +element:POST +nr:3) is supposed to match only one
> record. Indeed Lucene returns that record with the highest score but
> it also returns others that shouldn't be there at all even if it was
> an OR query. Another observation: it returns all records where "nr" >=
> 3. Notice the last record returned doesn't contain neither "POST" nor
> "3". I am attaching a self contained running example with this problem
> and would appreciate any comment.
>
> 0.6869936 Keyword<nr:3> Keyword<element:POST>
> 0.63916886 Keyword<nr:4> Keyword<element:POST>
> 0.6044586 Keyword<nr:6> Keyword<element:POST>
> 0.5773442 Keyword<nr:5> Keyword<element:POST>
> 0.56318253 Keyword<nr:9> Keyword<element:POST>
> 0.54449975 Keyword<nr:8> Keyword<element:POST>
> 0.5247468 Keyword<nr:7> Keyword<element:POST>
> 0.45054603 Keyword<nr:10> Keyword<element:GET>
Phew! It took me a while to spot this one...
The bug is with your test program. You keep adding fields to the same
document instance. If you change your program to print the entire
document, you'll see:
Query: +element:POST +nr:3
0.6869936 Document<Keyword<element:POST> Keyword<nr:3>
Keyword<element:POST> Keyword<nr:2> Keyword<element:POST> Keyword<nr:1>
Keyword<element:POST> Keyword<nr:0>>
0.63916886 Document<Keyword<element:POST> Keyword<nr:4>
Keyword<element:POST> Keyword<nr:3> Keyword<element:POST> Keyword<nr:2>
Keyword<element:POST> Keyword<nr:1> Keyword<element:POST> Keyword<nr:0>>
0.6044586 Document<Keyword<element:POST> Keyword<nr:6>
Keyword<element:POST> Keyword<nr:5> Keyword<element:POST> Keyword<nr:4>
Keyword<element:POST> Keyword<nr:3> Keyword<element:POST> Keyword<nr:2>
Keyword<element:POST> Keyword<nr:1> Keyword<element:POST> Keyword<nr:0>>
0.5773442 Document<Keyword<element:POST> Keyword<nr:5>
Keyword<element:POST> Keyword<nr:4> Keyword<element:POST> Keyword<nr:3>
Keyword<element:POST> Keyword<nr:2> Keyword<element:POST> Keyword<nr:1>
Keyword<element:POST> Keyword<nr:0>>
0.56318253 Document<Keyword<element:POST> Keyword<nr:9>
Keyword<element:POST> Keyword<nr:8> Keyword<element:POST> Keyword<nr:7>
Keyword<element:POST> Keyword<nr:6> Keyword<element:POST> Keyword<nr:5>
Keyword<element:POST> Keyword<nr:4> Keyword<element:POST> Keyword<nr:3>
Keyword<element:POST> Keyword<nr:2> Keyword<element:POST> Keyword<nr:1>
Keyword<element:POST> Keyword<nr:0>>
0.54449975 Document<Keyword<element:POST> Keyword<nr:8>
Keyword<element:POST> Keyword<nr:7> Keyword<element:POST> Keyword<nr:6>
Keyword<element:POST> Keyword<nr:5> Keyword<element:POST> Keyword<nr:4>
Keyword<element:POST> Keyword<nr:3> Keyword<element:POST> Keyword<nr:2>
Keyword<element:POST> Keyword<nr:1> Keyword<element:POST> Keyword<nr:0>>
0.5247468 Document<Keyword<element:POST> Keyword<nr:7>
Keyword<element:POST> Keyword<nr:6> Keyword<element:POST> Keyword<nr:5>
Keyword<element:POST> Keyword<nr:4> Keyword<element:POST> Keyword<nr:3>
Keyword<element:POST> Keyword<nr:2> Keyword<element:POST> Keyword<nr:1>
Keyword<element:POST> Keyword<nr:0>>
0.45054603 Document<Keyword<element:GET> Keyword<nr:10>
Keyword<element:POST> Keyword<nr:9> Keyword<element:POST> Keyword<nr:8>
Keyword<element:POST> Keyword<nr:7> Keyword<element:POST> Keyword<nr:6>
Keyword<element:POST> Keyword<nr:5> Keyword<element:POST> Keyword<nr:4>
Keyword<element:POST> Keyword<nr:3> Keyword<element:POST> Keyword<nr:2>
Keyword<element:POST> Keyword<nr:1> Keyword<element:POST> Keyword<nr:0>>
So you need to create a new document instance each time. I've attached
a modified version of your test program that does this and gives the
results you desire:
Query: +element:POST +nr:3
1.0 Document<Keyword<element:POST> Keyword<nr:3>>
Doug
--
To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>