Hi Jerry,

Is it possible for you to sample the data before hand?
Any idea of the potential size of the document content, is that deterministic? If you index 4K of data, do you end up with an index 1.2K in size for instance?

Is it possible to use a criteria like that to calculate the size of the index roughly and use that as a guideline during the load process? Then you get close to your threshold close and optimize the index to check it's size and re-open and re-load if required until you reach the desired size of the index.

Kind Regards
Noel

--------------------------------------------------
From: "Jerry Camel" <rlrc...@msn.com>
Sent: Tuesday, October 27, 2009 4:17 PM
To: <lucene-net-user@incubator.apache.org>
Subject: Re: Monitoring Index Size

It's not an inaccuracy that's the issue. It's that I keep getting exceptions thrown during the process. I was trying to get an approximate size without closing the index. If I've got 14,000 documents to index, closing the index and optimizing after each document is a lot of overhead. But, I fear, that may be my only option...

--------------------------------------------------
From: "Franklin Simmons" <fsimm...@sccmediaserver.com>
Sent: Tuesday, October 27, 2009 12:11 PM
To: <lucene-net-user@incubator.apache.org>
Subject: RE: Monitoring Index Size

Maybe one reason you are not getting an accurate account of the index size is IndexWriter buffering (MaxBufferedDocs). IndexWriter.Flush and IndexWriter.Optimize should prove useful in that regard. IndexWriter's code documentation covers buffering, commits etc in fair detail.


-----Original Message-----
From: Jerry Camel [mailto:rlrc...@msn.com]
Sent: Tuesday, October 27, 2009 11:26 AM
To: lucene-net-user@incubator.apache.org
Subject: Re: Monitoring Index Size

Hello, hello... Is this thing on? Can someone please acknowledge that my messages are coming through to the group? I've sent several questions over
the last few weeks and nary a response.  Thanks.

J

--------------------------------------------------
From: "Jerry Camel" <rlrc...@msn.com>
Sent: Monday, October 26, 2009 2:48 PM
To: <lucene-net-user@incubator.apache.org>
Subject: Monitoring Index Size

I've got a project where I need to create DVD sized collections of indexed
data.  Each disc will contain and index folder and a data folder.
Contents should be obvious.  My question is how can I monitor the index
size as I'm adding data so I can determine when the size of the data plus
the size of the index crosses a pre-determined threshold and I can close
out the disc and move on to the next?

At the moment I'm looping through the index folder and just sizing the
files. But it appears that Lucene is processing, as well, and sometimes I
try to get the size of a file that is no longer there.

Any advice on how to approach this without having to completely close the
index after each document?

Thanks.

J



Reply via email to