Hi Jerry,
Is it possible for you to sample the data before hand?
Any idea of the potential size of the document content, is that
deterministic?
If you index 4K of data, do you end up with an index 1.2K in size for
instance?
Is it possible to use a criteria like that to calculate the size of the
index roughly and use that as a guideline during the load process?
Then you get close to your threshold close and optimize the index to check
it's size and re-open and re-load if required until you reach the desired
size of the index.
Kind Regards
Noel
--------------------------------------------------
From: "Jerry Camel" <rlrc...@msn.com>
Sent: Tuesday, October 27, 2009 4:17 PM
To: <lucene-net-user@incubator.apache.org>
Subject: Re: Monitoring Index Size
It's not an inaccuracy that's the issue. It's that I keep getting
exceptions thrown during the process. I was trying to get an approximate
size without closing the index. If I've got 14,000 documents to index,
closing the index and optimizing after each document is a lot of overhead.
But, I fear, that may be my only option...
--------------------------------------------------
From: "Franklin Simmons" <fsimm...@sccmediaserver.com>
Sent: Tuesday, October 27, 2009 12:11 PM
To: <lucene-net-user@incubator.apache.org>
Subject: RE: Monitoring Index Size
Maybe one reason you are not getting an accurate account of the index
size is IndexWriter buffering (MaxBufferedDocs). IndexWriter.Flush and
IndexWriter.Optimize should prove useful in that regard. IndexWriter's
code documentation covers buffering, commits etc in fair detail.
-----Original Message-----
From: Jerry Camel [mailto:rlrc...@msn.com]
Sent: Tuesday, October 27, 2009 11:26 AM
To: lucene-net-user@incubator.apache.org
Subject: Re: Monitoring Index Size
Hello, hello... Is this thing on? Can someone please acknowledge that
my
messages are coming through to the group? I've sent several questions
over
the last few weeks and nary a response. Thanks.
J
--------------------------------------------------
From: "Jerry Camel" <rlrc...@msn.com>
Sent: Monday, October 26, 2009 2:48 PM
To: <lucene-net-user@incubator.apache.org>
Subject: Monitoring Index Size
I've got a project where I need to create DVD sized collections of
indexed
data. Each disc will contain and index folder and a data folder.
Contents should be obvious. My question is how can I monitor the index
size as I'm adding data so I can determine when the size of the data
plus
the size of the index crosses a pre-determined threshold and I can close
out the disc and move on to the next?
At the moment I'm looping through the index folder and just sizing the
files. But it appears that Lucene is processing, as well, and sometimes
I
try to get the size of a file that is no longer there.
Any advice on how to approach this without having to completely close
the
index after each document?
Thanks.
J