[jira] Commented: (LUCENE-2312) Search on IndexWriter's RAM Buffer

Michael McCandless (JIRA) Tue, 16 Mar 2010 02:46:52 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845777#action_12845777
 ]


Michael McCandless commented on LUCENE-2312:
--------------------------------------------

bq. The tricky part is to make sure that a reader always sees a consistent 
snapshot of the index. At the same time a reader must not follow pointers to 
non-published locations (e.g. array blocks).

Right, I'm just not familiar specifically with what JMM says about one thread 
writing to a byte[] and another thread reading it.

In general, for our usage, the reader threads will never read into an area that 
has not yet been written to.  So that works in our favor (they can't cache 
those bytes if they didn't read them).  EXCEPT the CPU will have loaded the 
bytes on a word boundary and so if our reader thread reads only 1 byte, and no 
more (because this is now the end of the posting), the CPU may very well have 
pulled in the following 7 bytes (for example) and then illegally (according to 
our needs) cache them.

We better make some serious tests for this... including reader threads that 
just enum the postings for a single rarish term over and over while writer 
threads are indexing docs that occasionally have that term.  I think that's the 
worst case for JMM violation since the #bytes cached is small.

It's too bad there isn't higher level control on the CPU caching via java.  EG, 
in our usage, if we could call a System.flushCPUCache whenever a thread enters 
a newly reopened reader.... because, when accessing postings via a given Reader 
we want point-in-time searching anyway and so any bytes cached by the CPU are 
perfectly fine.  We only need CPU cache flush when a reader is reopened....

> Search on IndexWriter's RAM Buffer
> ----------------------------------
>
>                 Key: LUCENE-2312
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2312
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>    Affects Versions: 3.0.1
>            Reporter: Jason Rutherglen
>            Assignee: Michael Busch
>             Fix For: 3.1
>
>
> In order to offer user's near realtime search, without incurring
> an indexing performance penalty, we can implement search on
> IndexWriter's RAM buffer. This is the buffer that is filled in
> RAM as documents are indexed. Currently the RAM buffer is
> flushed to the underlying directory (usually disk) before being
> made searchable. 
> Todays Lucene based NRT systems must incur the cost of merging
> segments, which can slow indexing. 
> Michael Busch has good suggestions regarding how to handle deletes using max 
> doc ids.  
> https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841923&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841923
> The area that isn't fully fleshed out is the terms dictionary,
> which needs to be sorted prior to queries executing. Currently
> IW implements a specialized hash table. Michael B has a
> suggestion here: 
> https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841915&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841915

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-2312) Search on IndexWriter's RAM Buffer

Reply via email to