> The realtime reader would have to have sub-readers per thread, and an aggregate reader that "joins" them by interleaving the docIDs
Nice (i.e. nice and complex)! Not knowing too much about the internals, how would the interleaving work? Does each subreader have a "start" ala Multi*Reader? Or are the doc ids incremented from a synced place such that no two readers have the same doc id? > BTW there are benefits to not reusing the RAM buffer, outside of faster near real-time search Not reusing the RAM buffer means not reusing the pooled byte arrays after a flush or something else? > thus allowing add/deletes in other threads to run. Currently they are all blocked ("stop the world") during flush SSDs are cool, I can't see management approving of those quite yet, are there many places piloting Lucene on SSDs that you're aware of? >From what you've said so far, this is how I understand realtime ram buffer readers could work: There'd be a IndexWriter.getRAMReader method that gathers all the ram buffers from the various threads, marks a doc id as the last one for the overall RAMBufferMultiReader. A new set of classes, RAMBufferTermEnum, RAMBufferTermDocs, RAMBufferTermPositions would be implemented that can read from the ram buffer. I don't think the current field cache API would like growing arrays? Something hopefully LUCENE-831 will support. On Sat, Apr 4, 2009 at 4:46 AM, Michael McCandless < luc...@mikemccandless.com> wrote: > On Fri, Apr 3, 2009 at 8:01 PM, Jason Rutherglen > <jason.rutherg...@gmail.com> wrote: > > I looked at the IndexWriter code in regards to creating a realtime > reader, > > with the many flexible indexing classes I'm unsure of how one would get a > > frozenish IndexInput of the byte slices, given the byte slices are > attached > > to different threads? > > The realtime reader would have to have sub-readers per thread, and an > aggregate reader that "joins" them by interleaving the docIDs. When > flushing we create such a beast, but, it's not general purpose (ie it > does not implement IndexReader API; it only implements enough to write > the postings). > > BTW there are benefits to not reusing the RAM buffer, outside of > faster near real-time search: it would allow flushing to be done in > the BG. Ie, flush could start, and we'd immediately switch to a new > RAM buffer, thus allowing add/deletes in other threads to run. > Currently they are all blocked ("stop the world") during flush, though > it's not clear on a fast IO device (SSD) how big a deal this "stop the > world" really is to indexing throughput. > > But still it's a complex change. > > Mike > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org > >