Thanks for your response, comments are below. I'm using Lucene 1.9.1.
 

> Van: Erick Erickson [mailto:[EMAIL PROTECTED] 
> Verzonden: maandag 14 augustus 2006 16:20
> Onderwerp: Re: Index not recreated
> 
> My first suspicion is that you have duplicate documents on 
> the *input* side, or are somehow adding documents more than 
> once. I use code similar to yours and it works just fine for me.....


This was my first suspicion also, but the facts seem to rule out this 
possibility. When I create an index from scratch (without having a previous, 
old one), everything is ok (no duplicates). This only happens the next time. So 
first I'm going to determine whether the index is really deleted after calling 
FSDirectory.getDirectory(indexDirectory, true). If this is the case, I'm going 
to check whether I add duplicates myself.


> How big is the index before and after you re-create it? Twice 
> the size and you're appending, not twice then.....


An additional problem is that my issue is only reproducable on the production 
environment and I have very limited access there. I cannot answer this right 
away. Furthermore, the problem does not occur always, which makes it even more 
fun ;-)

 
> Are you absolutely sure that you're not somehow, adding 
> documents more than once? I can imagine that this could occur 
> by processing the source multiple times (don't know how you 
> get your input) or adding the document multiple times through 
> some logic error. I've also had my SQL queries return the 
> same row more than once upon occasion, usually cured with the 
> "distinct"
> qualifier.
> 
> If you have some sort of unique ID, I can imagine debug code 
> with a set of IDs and error reporting when you add a doc 
> (row) already in your index.....


If I'm absolutely positive that the original index is removed by calling 
FSDirectory.getDirectory(indexDirectory, true), I'm going to explore this 
possibility and add some extensive logging to the pieces of code where 
documents are added (I do have a unique id, so this can be checked).


> Luke will help you examine your index to see if it's what you 
> think is there. Perhaps another way to test this would be to 
> add (again for
> debugging) a timestamp field in your index. That way, you 
> would know when you added your duplicate rows.


I haven't tried Luke yet to look at the index, since I haven't been able to get 
my hands on the actual index unfortunately.


> Finally, you might try creating an index in a new directory 
> that you *know* is empty and seeing what you get and how it 
> compares against your current process. Although I'd expect 
> your indexwriter code to barf if you had file locking issues 
> and couldn't empty the index, I suppose it's possible....


That's a good solution if all my other attempts fail :)



> On 8/14/06, Ronald Wildenberg <[EMAIL PROTECTED]> wrote:
> >
> > Hi,
> >
> > I'm experiencing the problem that my index does not seem to be 
> > recreated, despite using the correct flags. The result is that 
> > documents that represent equal database rows occur multiple 
> times in 
> > the index. I recreate my entire index each night.
> >
> > My IndexDirectory/IndexWriter construction code looks like this:
> >
> >    File indexDirectory = new File(indexPath);
> >    FSDirectory luceneIndexDirectory =
> > FSDirectory.getDirectory(indexDirectory, true);
> >    IndexWriter indexWriter = new IndexWriter(luceneIndexDirectory, 
> > analyzer, true);
> >
> > This code should take care of recreating my index, but it does not 
> > seem to be working properly. It looks like the old index is not 
> > removed and the same documents are added to my index again.
> >
> > I have strong reasons to not suspect other code to add duplicate 
> > documents. First, if no index has yet been created, no duplicate 
> > documents are added. Second, if an old index does exist, after 
> > recreating the index all documents exist exactly twice (and the 
> > following night they exist three times, etc.). It is not 
> the case that 
> > some documents are duplicated.
> >
> > Does anyone have any ideas?
> >
> > Thanks in advance,
> > Ronald.
> >
> >
> > DISCLAIMER:
> >
> > Dit bericht (met bijlagen) is met grote zorgvuldigheid 
> samengesteld. 
> > Voor mogelijke onjuistheid en/of onvolledigheid van de hierin 
> > verstrekte informatie kan Kennisnet geen aansprakelijkheid 
> aanvaarden, 
> > evenmin kunnen aan de inhoud van dit bericht (met bijlagen) rechten 
> > worden ontleend. De inhoud van dit bericht (met bijlagen) kan 
> > vertrouwelijke informatie bevatten en is uitsluitend 
> bestemd voor de 
> > geadresseerde van dit bericht. Indien u niet de beoogde 
> ontvanger van 
> > dit bericht bent, verzoekt Kennisnet u dit bericht te verwijderen, 
> > eventuele bijlagen niet te openen en wijst Kennisnet u op de 
> > onrechtmatigheid van het gebruiken, kopiƫren of verspreiden 
> van de inhoud van dit bericht (met bijlagen).
> >
> > This message (with attachments) is given in good faith. Kennisnet 
> > cannot assume any responsibility for the accuracy or reliability of 
> > the information contained in this message (with attachments), nor 
> > shall the information be construed as constituting any 
> obligation on 
> > the part of Kennisnet. The information contained in this 
> message (with 
> > attachments) may be confidential or privileged and is only intended 
> > for the use of the named addressee. If you are not the intended 
> > recipient, you are requested by Kennisnet to delete this 
> message (with 
> > attachments) without opening it and you are notified by 
> Kennisnet that 
> > any disclosure, copying or distribution of the information 
> contained 
> > in this message (with attachments) is strictly prohibited 
> and unlawful.
> >
> >
> >
> 


DISCLAIMER:

Dit bericht (met bijlagen) is met grote zorgvuldigheid samengesteld. Voor 
mogelijke onjuistheid en/of onvolledigheid van de hierin verstrekte informatie 
kan Kennisnet geen aansprakelijkheid aanvaarden, evenmin kunnen aan de inhoud 
van dit bericht (met bijlagen) rechten worden ontleend. De inhoud van dit 
bericht (met bijlagen) kan vertrouwelijke informatie bevatten en is uitsluitend 
bestemd voor de geadresseerde van dit bericht. Indien u niet de beoogde 
ontvanger van dit bericht bent, verzoekt Kennisnet u dit bericht te 
verwijderen, eventuele bijlagen niet te openen en wijst Kennisnet u op de 
onrechtmatigheid van het gebruiken, kopiƫren of verspreiden van de inhoud van 
dit bericht (met bijlagen).

This message (with attachments) is given in good faith. Kennisnet cannot assume 
any responsibility for the accuracy or reliability of the information contained 
in this message (with attachments), nor shall the information be construed as 
constituting any obligation on the part of Kennisnet. The information contained 
in this message (with attachments) may be confidential or privileged and is 
only intended for the use of the named addressee. If you are not the intended 
recipient, you are requested by Kennisnet to delete this message (with 
attachments) without opening it and you are notified by Kennisnet that any 
disclosure, copying or distribution of the information contained in this 
message (with attachments) is strictly prohibited and unlawful.

Reply via email to