[jira] Commented: (LUCENE-2312) Search on IndexWriter's RAM Buffer

Jason Rutherglen (JIRA) Mon, 10 Jan 2011 08:39:23 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12979631#action_12979631
 ]


Jason Rutherglen commented on LUCENE-2312:
------------------------------------------

{quote}Is that right that future RT readers are no longer immutable snapshots
(in a sense that they have variable maxDoc)?{quote}

The RT readers'll be point-in-time. There are many mechanisms to make this
happen that mainly revolve around a static maxDoc per reader while allowing
some of the underlying data structures to change during indexing. There are two
overall design issues right now and that is how to handle norms and the
system.arraycopy per getReader to create static read only parallel upto arrays.

I think system.arraycopy should be fast enough given it's a native instruction
on Intel. And for norms we may need to relax their accuracy in order to create
less garbage. That would involve either using a byte[][] for point-in-timeness
or a byte[] that is recalculated only as it's grown (meaning newer readers
created since the last array growth may see a slightly inaccurate norm value).
The norm byte[] would essentially be grown every N docs.



> Search on IndexWriter's RAM Buffer
> ----------------------------------
>
>                 Key: LUCENE-2312
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2312
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>    Affects Versions: Realtime Branch
>            Reporter: Jason Rutherglen
>            Assignee: Michael Busch
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2312-FC.patch, LUCENE-2312.patch
>
>
> In order to offer user's near realtime search, without incurring
> an indexing performance penalty, we can implement search on
> IndexWriter's RAM buffer. This is the buffer that is filled in
> RAM as documents are indexed. Currently the RAM buffer is
> flushed to the underlying directory (usually disk) before being
> made searchable. 
> Todays Lucene based NRT systems must incur the cost of merging
> segments, which can slow indexing. 
> Michael Busch has good suggestions regarding how to handle deletes using max 
> doc ids.  
> https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841923&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841923
> The area that isn't fully fleshed out is the terms dictionary,
> which needs to be sorted prior to queries executing. Currently
> IW implements a specialized hash table. Michael B has a
> suggestion here: 
> https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841915&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841915

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-2312) Search on IndexWriter's RAM Buffer

Reply via email to