Grant
Thanks for your response. I have fixed this issue. I have indexed 5 MB
worth of text files and I now only use 224 KB. I was getting 80 MB. The
only change I made was to change the way I merge my temp index into my prod
index. My code changed from:
prodWriter.setUseCompoundFile(true);
prodWriter.addIndexes(new IndexReader[] { tempReader });
To:
int iNumDocs = tempReader.numDocs();
for (int y = 0; y < iNumDocs; y++) {
Document tempDoc = tempReader.document(y);
prodWriter.addDocument(tempDoc);
}
I don't know if this is a bug in the IndexWriter.addIndexes(IndexReader)
method or something else I am doing that caused this, but I am getting much
better results now.
Thanks to everyone who helped, I really appreciate it.
Rob
----- Original Message -----
From: "Grant Ingersoll" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Thursday, August 19, 2004 10:51 AM
Subject: Re: Index Size
How many fields do you have and what analyzer are you using?
>>> [EMAIL PROTECTED] 8/19/2004 11:54:25 AM >>>
Otis
I upgraded to 1.4.1. I deleted all of my old indexes and started from
scratch. I indexed 2 MB worth of text files and my index size is 8
MB.
Would it be better if I stopped using the
IndexWriter.addIndexes(IndexReader) method and instead traverse the
IndexReader on the temp index and use
IndexWriter.addDocument(Document)
method?
Thanks again for your input, I appreciate it.
Rob
----- Original Message -----
From: "Otis Gospodnetic" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Thursday, August 19, 2004 8:00 AM
Subject: Re: Index Size
Just go for 1.4.1 and look at the CHANGES.txt file to see if there
were
any index format changes. If there were, you'll need to re-index.
Otis
--- Rob Jose <[EMAIL PROTECTED]> wrote:
> Otis
> I am using Lucene 1.3 final. Would it help if I move to Lucene 1.4
> final?
>
> Rob
> ----- Original Message -----
> From: "Otis Gospodnetic" <[EMAIL PROTECTED]>
> To: "Lucene Users List" <[EMAIL PROTECTED]>
> Sent: Thursday, August 19, 2004 7:13 AM
> Subject: Re: Index Size
>
>
> I thought this was the case. I believe there was a bug in one of
the
> recent Lucene releases that caused old CFS files not to be removed
> when
> they should be removed. This resulted in your index directory
> containing a bunch of old CFS files consuming your disk space.
>
> Try getting a recent nightly build and see if using that takes car
> eof
> your problem.
>
> Otis
>
> --- Rob Jose <[EMAIL PROTECTED]> wrote:
>
> > Hey George
> > Thanks for responding. I am using windows and I don't see any
> hidden
> > files.
> > I have a ton of CFS files (1366/1405). I have 22 F# (F1, F2,
etc.)
> > files.
> > I have two FDT files and two FDX files. And three FNM files. Add
> > these
> > files to the deletable and segments file and that is all of the
> files
> > that I
> > have. The CFS files are appoximately 11 MB each. The totals I
> gave
> > you
> > before were for all of my indexes together. This particular index
> > has a
> > size of 21.6 GB. The files that it indexed have a size of 89 MB.
> >
> > OK - I just removed all of the CFS files from the directory and I
> can
> > still
> > read my indexes. So know I have to ask what are these CFS files?
> > Why are
> > they created? And how can I get rid of them if I don't need them.
> I
> > will
> > also take a look at the Lucene website to see if I can find any
> > information.
> >
> > Thanks
> > Rob
> >
> > ----- Original Message -----
> > From: "Honey George" <[EMAIL PROTECTED]>
> > To: "Lucene Users List" <[EMAIL PROTECTED]>
> > Sent: Thursday, August 19, 2004 12:29 AM
> > Subject: Re: Index Size
> >
> >
> > Hi,
> > Please check for hidden files in the index folder. If
> > you are using linx, do something like
> >
> > ls -al <index folder>
> >
> > I am also facing a similar problem where the index
> > size is greater than the data size. In my case there
> > were some hidden temproary files which the lucene
> > creates.
> > That was taking half of the total size.
> >
> > My problem is that after deleting the temporary files,
> > the index size is same as that of the data size. That
> > again seems to be a problem. I am yet to find out the
> > reason..
> >
> > Thanks,
> > george
> >
> >
> > --- Rob Jose <[EMAIL PROTECTED]> wrote:
> > > Hello
> > > I have indexed several thousand (52 to be exact)
> > > text files and I keep running out of disk space to
> > > store the indexes. The size of the documents I have
> > > indexed is around 2.5 GB. The size of the Lucene
> > > indexes is around 287 GB. Does this seem correct?
> > > I am not storing the contents of the file, just
> > > indexing and tokenizing. I am using Lucene 1.3
> > > final. Can you guys let me know what you are
> > > experiencing? I don't want to go into production
> > > with something that I should be configuring better.
> > >
> > >
> > > I am not sure if this helps, but I have a temp index
> > > and a real index. I index the file into the temp
> > > index, and then merge the temp index into the real
> > > index using the addIndexes method on the
> > > IndexWriter. I have also set the production writer
> > > setUseCompoundFile to true. I did not set this on
> > > the temp index. The last thing that I do before
> > > closing the production writer is to call the
> > > optimize method.
> > >
> > > I would really appreciate any ideas to get the index
> > > size smaller if it is at all possible.
> > >
> > > Thanks
> > > Rob
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]