is there a way to find duplicate documents in the index?

2006-03-13 Thread emerson cargnin
I notice some duplicated entries in my index, my just looking at it, and I suspect there might be more than those I found out. Is there a way to detect duplicate documents in an index? Emerson Cargnin - To unsubscribe, e-mail

colapsing the result by a given field

2006-03-11 Thread emerson cargnin
In my company's system we need to make a search that would return hundreds of result. Its a search over extracts of websites of the companies we list. We have the ID (usually max of 10 will be used at each search) of the companies which are used to bring the extracts (each ID my have hundreds of

using writer.setMergeFactor(1000) instead of writer.mergeFactor=1000 breaks lucene

2006-03-10 Thread emerson cargnin
I just apdated to lucene 1.9.1 and when I use the method of IndexWriter writer.setMergeFactor(1000) instead of the property writer.mergeFactor=1000 breaks lucene completely... any clue? Emerson Cargnin - To unsubscribe, e-mail

Re: using writer.setMergeFactor(1000) instead of writer.mergeFactor=1000 breaks lucene

2006-03-10 Thread emerson cargnin
my fault, the eclipse project I was running had a project reference that was still pointing to the old version of lucene. sorry for that :) Emerson On 10/03/06, emerson cargnin [EMAIL PROTECTED] wrote: I just apdated to lucene 1.9.1 and when I use the method of IndexWriter

Re: Restricting the number of docs per search field

2006-02-28 Thread emerson cargnin
does anyone knows a solution for that? I know theres a method that returns a TopDoc, but it needs a filter, and in my case, Ill need the first 2 of each doc with the same value in a given property. On 27/02/06, emerson cargnin [EMAIL PROTECTED] wrote: Hi all Due a performance problem, I'm

Re: Restricting the number of docs per search field

2006-02-28 Thread emerson cargnin
, as Lucene is pretty fast. Have you done profiling, etc. to determine that Lucene is actually the bottleneck? -Grant emerson cargnin wrote: does anyone knows a solution for that? I know theres a method that returns a TopDoc, but it needs a filter, and in my case, Ill need the first 2

Restricting the number of docs per search field

2006-02-27 Thread emerson cargnin
Hi all Due a performance problem, I'm looking a way of restricting the docs returned based of the number of docs which a field has the same value. At the moment we just discard the docs if more than a X number for the same field, but I think it can be done by lucence, hence improving a lot the