[jira] Commented: (LUCENE-1313) Realtime Search

Michael McCandless (JIRA) Wed, 06 May 2009 04:17:57 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12706377#action_12706377
 ]


Michael McCandless commented on LUCENE-1313:
--------------------------------------------


{quote}
> RAMDir changes (deletes are applied, or a new RAM segment is
> created), we must push down to DW that usage with a new synchronized
> method.

Sounds like we create a subclass of RAMDirectory with this
functionality?
{quote}

I don't think that's needed.  I think whenever IW makes a change to
the RAMDir, which is easily tracked, it pushes to DW the new RAMDir
size.

{quote}
> We don't need IW.getRamLogMergePolicy()?

Because we don't want the user customizing this?
{quote}
That, and because it's only used to determine CFS or not, which we've
turned off for RAMDir.

{quote}
> We should no longer need IndexWriter.getFlushDirectory? IE, IW
> once again has a single "Directory" as seen by IFD,
> DocFieldProcessorPerThread, etc. In the NRT case, this is an FSD; in
> the non-NRT case it's the Dir that was passed in (unless, in a future
> issue, we explore using FSD, too, for better performance).

Pass in FSD in the constructor of DocumentsWriter (and others) as
before?
{quote}

Right.  All these places could care less if they are dealing w/ FSD or
a "real" dir.  They should simply use the Directory API as they
previously did.

{quote}
> I still don't think we need a separate RAMMergeScheduler; I
> think CMS should simply always run such merges (ie not block on max
> thread count). IW.getNextMerge can then revert to its former
> self.

Where does the thread come from for this if we're using max threads?
If we allocate one, we're over limit and keeping it around. We'd need
a more advanced threadpool that elastically grows the thread pool and
kills threads that are unused over time. With Java 1.5 we can use
ThreadPoolExecutor. Is a dedicated thread pool something we want to
go to? Even then we can potentially still max out a given thread pool
with requests to merge one directory or the other. We'd probably
still need two separate thread pools.
{quote}

The thread is simply launched w/o checking maxThreadCount, if the
merge is in RAM.

Right, with JDK 1.5 we can make CMS better about pooling threads.
Right now it does no long-term pooling (unless another merge happens
to be needed when a thread finishes its last merge).

{quote}
> MergePolicy.OneMerge.segString no longer needs to take a
> Directory (because it now stores a Directory).

Yeah, I noticed this, I'll change it. MergeSpecification.segString is
public and takes a directory that is not required. What to do?
{quote}
Do the usual back-compat dance -- deprecate it and add the new one.

{quote}
> The dual directories is continuing to push deeper (when I'm
> wanting to do the reverse). EG, MergeScheduler.getDestinationDirs
> should not be needed?

If we remove getFlushDirectory, are you saying getDirectory should
return the FSD if RAM NRT is turned on? This seems counter intuitive
in that we still need a clear separation of the two directories? The
user would expect the directory they passed into the ctor to be
returned?
{quote}

I agree, we should leave getDirectory() as is (returns whatever Dir
was passed in).

We can keep getFlushDirectory, but it should not have duality inside it
-- it should simply return the FSD (in the NRT case) or the normal
dir.  I don't really like the name getFlushDirectory... but can't
think of a better one yet.

Then, nothing outside of IW should ever know there are two directories
at play.  They all simply deal with the one and only Directory that IW
hands out.

On the "when to flush to RAM" question... I agree it's tricky.  This
logic belongs in the RAMMergePolicy.  That policy needs to be
empowered to decide if a new flush goes to RAM or disk, to decide when
to merge all RAM segments to a new disk segment, to be able to check
if IW is in NRT mode, etc.  Probably the RAM merge policy also needs
control over how much of the RAM buffer it's going to give to DW,
too. At first the policy should not change the non-NRT case (ie one
always flushes straight to disk).  We can play w/ that in a separate
issue.  Need to think more about the logic...


> Realtime Search
> ---------------
>
>                 Key: LUCENE-1313
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1313
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index
>    Affects Versions: 2.4.1
>            Reporter: Jason Rutherglen
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1313.jar, LUCENE-1313.patch, LUCENE-1313.patch, 
> LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, 
> LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, lucene-1313.patch, 
> lucene-1313.patch, lucene-1313.patch, lucene-1313.patch
>
>
> Realtime search with transactional semantics.  
> Possible future directions:
>   * Optimistic concurrency
>   * Replication
> Encoding each transaction into a set of bytes by writing to a RAMDirectory 
> enables replication.  It is difficult to replicate using other methods 
> because while the document may easily be serialized, the analyzer cannot.
> I think this issue can hold realtime benchmarks which include indexing and 
> searching concurrently.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1313) Realtime Search

Reply via email to