> The realtime reader would have to have sub-readers per thread,
and an aggregate reader that "joins" them by interleaving the
docIDs

Nice (i.e. nice and complex)! Not knowing too much about the
internals, how would the interleaving work? Does each subreader
have a "start" ala Multi*Reader? Or are the doc ids incremented
from a synced place such that no two readers have the same doc
id?

> BTW there are benefits to not reusing the RAM buffer, outside
of faster near real-time search

Not reusing the RAM buffer means not reusing the pooled byte
arrays after a flush or something else?

> thus allowing add/deletes in other threads to run. Currently
they are all blocked ("stop the world") during flush

SSDs are cool, I can't see management approving of those quite
yet, are there many places piloting Lucene on SSDs that you're
aware of?

>From what you've said so far, this is how I understand realtime
ram buffer readers could work:

There'd be a IndexWriter.getRAMReader method that gathers all
the ram buffers from the various threads, marks a doc id as the
last one for the overall RAMBufferMultiReader. A new set of
classes, RAMBufferTermEnum, RAMBufferTermDocs,
RAMBufferTermPositions would be implemented that can read from
the ram buffer.

I don't think the current field cache API would like growing
arrays? Something hopefully LUCENE-831 will support.

On Sat, Apr 4, 2009 at 4:46 AM, Michael McCandless <
luc...@mikemccandless.com> wrote:

> On Fri, Apr 3, 2009 at 8:01 PM, Jason Rutherglen
> <jason.rutherg...@gmail.com> wrote:
> > I looked at the IndexWriter code in regards to creating a realtime
> reader,
> > with the many flexible indexing classes I'm unsure of how one would get a
> > frozenish IndexInput of the byte slices, given the byte slices are
> attached
> > to different threads?
>
> The realtime reader would have to have sub-readers per thread, and an
> aggregate reader that "joins" them by interleaving the docIDs.  When
> flushing we create such a beast, but, it's not general purpose (ie it
> does not implement IndexReader API; it only implements enough to write
> the postings).
>
> BTW there are benefits to not reusing the RAM buffer, outside of
> faster near real-time search: it would allow flushing to be done in
> the BG.  Ie, flush could start, and we'd immediately switch to a new
> RAM buffer, thus allowing add/deletes in other threads to run.
> Currently they are all blocked ("stop the world") during flush, though
> it's not clear on a fast IO device (SSD) how big a deal this "stop the
> world" really is to indexing throughput.
>
> But still it's a complex change.
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>
>

Reply via email to