On Thu, Mar 26, 2009 at 07:06:26AM -0400, Michael McCandless wrote: > We'd need to add a few methods to IndexReader,
Eep. IndexReader's too big as it is. > eg querying whether > compound file format is in use, whether separate norms are stored, > "get me total size in bytes of all files" (or maybe just "get me all > files", plus utility method somewhere to add up the sizes), so this > approach seems doable. Do you really need all that? I think the crucial info is already available: * The number of docs in each segment. * The number of deletions in each segment, allowing you to calculate the deletion percentage. I think it's reasonable to assume an average distribution of document sizes across segments. Sure, that'll be wrong at the long tail of the curve, but most of the time it will be right -- and even when it's not, it won't cause big problems. > But: we don't yet have IndexWriter holding open a reader for every > segment. We are working on realtime search (LUCENE-1516), but even > then, if you don't ask for a realtime reader from IndexWriter, it > won't hold open SegmentReaders for all segments. Yeah, that's gonna be a bigger problem. :( It's cake to give Lucy's indexer a reader, because opening readers is cheap. But the Lucene heavy-IndexReader model messes that up -- IndexWriter has traditionally been a fast class to open. Marvin Humphrey --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org