On Thu, Mar 26, 2009 at 9:51 PM, Marvin Humphrey <mar...@rectangular.com> wrote:
>> eg querying whether >> compound file format is in use, whether separate norms are stored, >> "get me total size in bytes of all files" (or maybe just "get me all >> files", plus utility method somewhere to add up the sizes), so this >> approach seems doable. > > Do you really need all that? I think the crucial info is already available: > > * The number of docs in each segment. > * The number of deletions in each segment, allowing you to calculate the > deletion percentage. I'm just going w/ the info that Log*MergePolicy use today -- checking CFS, separate dels & norms, is done for "isOptimized"; oh, actually IndexReader has an isOptimized(), which we could simply use, instead. > I think it's reasonable to assume an average distribution of document sizes > across segments. Sure, that'll be wrong at the long tail of the curve, but > most of the time it will be right -- and even when it's not, it won't cause > big problems. Yeah this might be acceptable in practice, though users who add a bunch of tiny docs followed by a bunch of big docs (or v/v) may see poor merge choices. Maybe in practice it wouldn't be a big deal. >> But: we don't yet have IndexWriter holding open a reader for every >> segment. We are working on realtime search (LUCENE-1516), but even >> then, if you don't ask for a realtime reader from IndexWriter, it >> won't hold open SegmentReaders for all segments. > > Yeah, that's gonna be a bigger problem. :( It's cake to give Lucy's indexer > a reader, because opening readers is cheap. But the Lucene heavy-IndexReader > model messes that up -- IndexWriter has traditionally been a fast class to > open. Right, this one seems like the deal breaker: IndexWriter should not in general go and pool readers on all segments in the index. Mike --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org