[jira] Commented: (LUCENE-2312) Search on IndexWriter's RAM Buffer

Michael McCandless (JIRA) Sun, 14 Mar 2010 03:01:52 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845059#action_12845059
 ]


Michael McCandless commented on LUCENE-2312:
--------------------------------------------

bq. IW flush could become thread dependent 

Right, we want this -- different RAM segments should be flushed at different 
times.  This gives us better concurrency since IO/CPU resource consumption will 
now be more interleaved.  While one RAM segment is flushing, the others are 
still indexing.

{quote}
A new term will first check the hash table for existence (as
currently), if it's not in the term hash table only then will it
be added to the btree (btw, binary search is O(log N) on
average?) This way we're avoiding the somewhat costlier btree
existence check per token.
{quote}

Yes, we could have btree on-the-side but still use hash for mapping (vs using 
btree alone).  Hash will be faster lookups... btree could be created/updated on 
demand first time something needs to .next() through the TermsEnum.

{quote
The algorithm for flushing doc writers based on RAM
consumption can simply be, on exceed, flush the doc writer
consuming the most RAM
{quote}

Sounds good :)  The challenge will be balancing things... eg if during the time 
1 RAM segment is flushed, the others are able to consume more RAM that was 
freed up by flushing this one RAM segment, you've got a problem... or maybe at 
that point you go and flush the next one now using the most RAM, so it'd self 
balance with time.

This will mean the RAM usage is able to flare up above the high water mark...

{quote}
I gutted the PerThread classes, then realized, it's all too
intertwined. I'd rather get something working, than spend an
excessive amount of time rearranging code that already works.
{quote}

For starters I would keep the *PerThread, but create multiple DWs?  Ie, 
removing the PerThread layer doesn't have to happen at first.

Or we could do the nuclear option -- make a new indexing chain.

> Search on IndexWriter's RAM Buffer
> ----------------------------------
>
>                 Key: LUCENE-2312
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2312
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>    Affects Versions: 3.0.1
>            Reporter: Jason Rutherglen
>            Assignee: Michael Busch
>             Fix For: 3.1
>
>
> In order to offer user's near realtime search, without incurring
> an indexing performance penalty, we can implement search on
> IndexWriter's RAM buffer. This is the buffer that is filled in
> RAM as documents are indexed. Currently the RAM buffer is
> flushed to the underlying directory (usually disk) before being
> made searchable. 
> Todays Lucene based NRT systems must incur the cost of merging
> segments, which can slow indexing. 
> Michael Busch has good suggestions regarding how to handle deletes using max 
> doc ids.  
> https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841923&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841923
> The area that isn't fully fleshed out is the terms dictionary,
> which needs to be sorted prior to queries executing. Currently
> IW implements a specialized hash table. Michael B has a
> suggestion here: 
> https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841915&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841915

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-2312) Search on IndexWriter's RAM Buffer

Reply via email to