I looked at the IndexWriter code in regards to creating a realtime reader, with the many flexible indexing classes I'm unsure of how one would get a frozenish IndexInput of the byte slices, given the byte slices are attached to different threads?
On Fri, Apr 3, 2009 at 2:42 PM, Jason Rutherglen <jason.rutherg...@gmail.com > wrote: > > I think the realtime reader'd just store the maxDocID it's allowed to > search, and we would likely keep using the RAM format now used. > > Sounds pretty good. Are there any other gotchas in the design? > > > > On Thu, Apr 2, 2009 at 1:40 AM, Michael McCandless < > luc...@mikemccandless.com> wrote: > >> On Wed, Apr 1, 2009 at 7:05 PM, Jason Rutherglen >> <jason.rutherg...@gmail.com> wrote: >> > Now that LUCENE-1516 is close to being committed perhaps we can >> > figure out the priority of other issues: >> > >> > 1. Searchable IndexWriter RAM buffer >> >> I think first priority is to get a good assessment of the performance >> of the current implementation (from LUCENE-1516). >> >> My initial tests are very promising: with a writer updating (replacing >> random docs) at 50 docs/second on a full (3.2 M) Wikipedia index, I >> was able to get reopen the reader once per second and do a large (> >> 500K results) search that sorts by date. The reopen time was >> typically ~40 msec, and search time typically ~35 msec (though there >> were random spikes up to ~340 msec). Though, these results were on an >> SSD (Intel X25M 160 GB). >> >> We need more datapoints of the current approach, but this looks likely >> to be good enough for starters. And since we can get it into 2.9, >> hopefully it'll get some early usage and people will report back to >> help us assess whether further performance improvements are necessary. >> >> If they do turn out to be necessary, I think before your step 1, we >> should write small segments into a RAMDirectory instead of the "real" >> directory. That's simpler than truly searching IndexWriter's >> in-memory postings data. >> >> > 2. Finish up benchmarking and perhaps implement passing >> > filters to the SegmentReader level >> >> What is "passing filters to the SegmentReader level"? EG as of >> LUCENE-1483, we now ask a Filter for it's DocIdSet once per >> SegmentReader. >> >> > 3. Deleting by doc id using IndexWriter >> >> We need a clean approach for the "docIDs suddenly shift when merge is >> committed" problem for this... >> >> Thinking more on this... I think one possible solution may be to >> somehow expose IndexWriter's internal docID remapping code. >> IndexWriter does delete by docID internally, and whenever a merge is >> committed we stop-the-world (sync on IW) and go remap those docIDs. >> If we somehow allowed user to register a callback that we could call >> when this remapping occurs, then user's code could carry the docIDs >> without them becoming stale. Or maybe we could make a class >> "PendingDocIDs", which you'd ask the reader to give you, that holds >> docIDs and remaps them after each merge. The problem is, IW >> internally always logically switches to the current reader for any >> further docID deletion, but the user's code may continue to use an old >> reader. So simply exposing this remapping won't fix it... we'd need >> to somehow track the genealogy (quite a bit more complex). >> >> > With 1) I'm interested in how we will lock a section of the >> > bytes for use by a given reader? We would not actually lock >> > them, but we need to set aside the bytes such that for example >> > if the postings grows, TermDocs iteration does not progress to >> > beyond it's limits. Are there any modifications that are needed >> > of the RAM buffer format? How would the term table be stored? We >> > would not be using the current hash method? >> >> I think the realtime reader'd just store the maxDocID it's allowed to >> search, and we would likely keep using the RAM format now used. >> >> Mike >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-dev-h...@lucene.apache.org >> >> >