[jira] Commented: (LUCENE-2312) Search on IndexWriter's RAM Buffer

Michael McCandless (JIRA) Mon, 15 Mar 2010 02:37:52 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845255#action_12845255
 ]


Michael McCandless commented on LUCENE-2312:
--------------------------------------------

Yes, commit should flush & sync all doc writers, and rollback must abort all of 
them.

bq. I also have a separate indexing chain prototype working with searchable RAM 
buffer (single-threaded)

Yay!

bq. but slightly different postinglist format (some docs nowadays only have 140 
characters ).

New sponsor, eh?  ;)

But, yes, I suspect an indexer chain optimized to tiny docs can get sizable 
gains.

What change to the postings format?  Is the change only in the RAM
buffer or also in the index?  If it's in the index... we should
probably do this under flex.

bq. It seems really fast. I spent a long time thinking about lock-free 
algorithms and data structures, so indexing performance should be completely 
independent of the search load (in theory). I need to think a bit more about 
how to make it work with "normal" documents and Lucene's current in-memory 
format.

Sounds like awesome progress!!  Want some details over here :)


> Search on IndexWriter's RAM Buffer
> ----------------------------------
>
>                 Key: LUCENE-2312
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2312
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>    Affects Versions: 3.0.1
>            Reporter: Jason Rutherglen
>            Assignee: Michael Busch
>             Fix For: 3.1
>
>
> In order to offer user's near realtime search, without incurring
> an indexing performance penalty, we can implement search on
> IndexWriter's RAM buffer. This is the buffer that is filled in
> RAM as documents are indexed. Currently the RAM buffer is
> flushed to the underlying directory (usually disk) before being
> made searchable. 
> Todays Lucene based NRT systems must incur the cost of merging
> segments, which can slow indexing. 
> Michael Busch has good suggestions regarding how to handle deletes using max 
> doc ids.  
> https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841923&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841923
> The area that isn't fully fleshed out is the terms dictionary,
> which needs to be sorted prior to queries executing. Currently
> IW implements a specialized hash table. Michael B has a
> suggestion here: 
> https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841915&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841915

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-2312) Search on IndexWriter's RAM Buffer

Reply via email to