[jira] Commented: (LUCENE-2312) Search on IndexWriter's RAM Buffer

Michael Busch (JIRA) Mon, 22 Mar 2010 10:41:53 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848226#action_12848226
 ]


Michael Busch commented on LUCENE-2312:
---------------------------------------

I think sync'ing after every doc is probably the better option.  We'll still 
avoid the need to make all variables downstream of DocumentsWriter 
volatile/atomic, which should be a nice performance gain.

The problem with the delayed sync'ing (after e.g. 100 docs) is that if you 
don't have a never-ending stream of twee... err documents, then you might want 
to force an explicit sync at some point.  But that's very hard, because you 
would have to force the writer thread to make e.g. a volatile write via an API 
call.  And if that's an IndexWriter writer API that has to trigger the sync on 
multiple DocumentsWriter instances (i.e. multiple writer threads) I don't see 
how that's possible unless Lucene manages it's own thread of pools.

> Search on IndexWriter's RAM Buffer
> ----------------------------------
>
>                 Key: LUCENE-2312
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2312
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>    Affects Versions: 3.0.1
>            Reporter: Jason Rutherglen
>            Assignee: Michael Busch
>             Fix For: 3.1
>
>
> In order to offer user's near realtime search, without incurring
> an indexing performance penalty, we can implement search on
> IndexWriter's RAM buffer. This is the buffer that is filled in
> RAM as documents are indexed. Currently the RAM buffer is
> flushed to the underlying directory (usually disk) before being
> made searchable. 
> Todays Lucene based NRT systems must incur the cost of merging
> segments, which can slow indexing. 
> Michael Busch has good suggestions regarding how to handle deletes using max 
> doc ids.  
> https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841923&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841923
> The area that isn't fully fleshed out is the terms dictionary,
> which needs to be sorted prior to queries executing. Currently
> IW implements a specialized hash table. Michael B has a
> suggestion here: 
> https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841915&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841915

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-2312) Search on IndexWriter's RAM Buffer

Reply via email to