[jira] Commented: (LUCENE-1313) Realtime Search

Jason Rutherglen (JIRA) Tue, 05 May 2009 22:22:55 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12706305#action_12706305
 ]


Jason Rutherglen commented on LUCENE-1313:
------------------------------------------

I'm not sure we have the right model yet for deciding when to
flush the ram buffer and/or ram segments. Perhaps we can simply
divide the ram buffer size in half, allocating one part to the
ram buffer, the other to the ram segments. When one exceeds it's
(rambuffersize/2) allotment, it's flushed to disk. This way if
the ram buffer size is 32MB, we will always safely flush 16MB to
disk. The more ram allotted, greater the size of what's flushed
to disk. We may eventually want to offer an expert method to set
the ram buffer size and ram dir max size individually. 

Put another way I think we need a balanced upper limit for the
ram buffer and the NRT ram dir, which seems (to me) to be hard
to achieve by allowing too much growth at the expensive of the
other.

I'd like to stay away from flushing the ram buffer to disk when
it's below say 20% of the ram buffer size as it seems
inefficient to do this (because we'll have to do an expensive
disk merge on it later). On the other hand if the user is not
calling get reader very often and we're auto flushing at 1/2 the
ram buffer size, we're short changing ourselves and only
flushing a segment half the size of what it could be. I suppose
we could stick with the 1/2 model, only turning it on once ram
segments are being merged in ram?

If when merging ram segments (using the specialized
RAMMergePolicy) we only merge in ram the ones that fit, what do
we do with the ram segments remaining that need to be flushed to
disk? What if they are only make up 20% of the total size of the
ram segments? If we merge the 20% to disk it seems inefficient?

> Realtime Search
> ---------------
>
>                 Key: LUCENE-1313
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1313
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index
>    Affects Versions: 2.4.1
>            Reporter: Jason Rutherglen
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1313.jar, LUCENE-1313.patch, LUCENE-1313.patch, 
> LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, 
> LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, lucene-1313.patch, 
> lucene-1313.patch, lucene-1313.patch, lucene-1313.patch
>
>
> Realtime search with transactional semantics.  
> Possible future directions:
>   * Optimistic concurrency
>   * Replication
> Encoding each transaction into a set of bytes by writing to a RAMDirectory 
> enables replication.  It is difficult to replicate using other methods 
> because while the document may easily be serialized, the analyzer cannot.
> I think this issue can hold realtime benchmarks which include indexing and 
> searching concurrently.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1313) Realtime Search

Reply via email to