On Fri, Mar 27, 2009 at 12:13 PM, Marvin Humphrey <mar...@rectangular.com> wrote:
>> Whereas in Lucene neither MultiSegmentReader nor SegmentReader is public. > > I had thought making SegmentReader public was at least under consideration as > part of the implementation for segment-centric sorted search, but I guess it > turned out not to be necessary. Still, you have > IndexReader.getSequentialSubReaders(). That might be enough -- at least for > this part of the problem. :) Yes, enough for now I suppose. Though we have LUCENE-831 up next (fixing FieldCache API). >> > As for the actual implementation of MergePolicy, I haven't prototyped that >> > out >> > yet. Right now in KS, the infrastructure is reasonably primitive: >> > IndexManager has a method called SegReaders_To_Merge() which accepts a >> > PolyReader as an argument and returns an array of SegReaders representing >> > content that should be merged. >> >> KS does the fibonacci merge policy right? > > Yes. > > SegReaders_To_Merge is overridden in certain parts of the test suite, but it's > not yet public. However, control over merging policy will soon *have* to be > made public somehow in order to support real-time indexing, so working out an > API is on my near-term agenda. Why must merge policy be made public for realtime search? >> >> Even though Lucy's SegmentReader is lighter weight, it still seems >> >> like you shouldn't be opening them in the writer (except for realtime >> >> search)? >> > >> > I don't see why not. >> >> But it still ties up resources? > > Not enough to worry about, I believe. Hmm OK. >> EG mmap uses up chunks of your address space (possibly important on 32 bit >> machines, > > This is an important concern, but I believe that design-wise, we have a > solution[1] -- on 32-bit systems, we only mmap sliding windows rather than > whole files. Nice! > Furthermore, mmap is called with the MAP_SHARED flag, so IndexReaders across > multiple processes hitting the same exact memory segment get to share it. > (This is more important under 64-bit systems, where we do map the whole file > straightaway.) Great. >> opening files takes time & descriptors, etc. > > Launching an IndexReader is still plenty fast. > > Actually, if you're not warming sort caches, launching a Lucene IndexReader > isn't obscenely expensive any more -- just expensive. Right? We load deleted docs on init (1 bit per doc = fast), terms index (= alot of stuff every 128 terms = maybe slow), norms on the first search that hits that field (1 byte per doc = probably OK), and FieldCache on first search that uses it. So "it depends" I guess? > [1] At least on Unixen. I believe we can support all of this using Windows > MapViewOfFile and friends, and I had a crude prototype working before, but > right now Windows is still using the old-school load-into-process-memory > style. Excellent! Mike --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org