[ https://issues.apache.org/jira/browse/LUCENE-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12706377#action_12706377 ]
Michael McCandless commented on LUCENE-1313: -------------------------------------------- {quote} > RAMDir changes (deletes are applied, or a new RAM segment is > created), we must push down to DW that usage with a new synchronized > method. Sounds like we create a subclass of RAMDirectory with this functionality? {quote} I don't think that's needed. I think whenever IW makes a change to the RAMDir, which is easily tracked, it pushes to DW the new RAMDir size. {quote} > We don't need IW.getRamLogMergePolicy()? Because we don't want the user customizing this? {quote} That, and because it's only used to determine CFS or not, which we've turned off for RAMDir. {quote} > We should no longer need IndexWriter.getFlushDirectory? IE, IW > once again has a single "Directory" as seen by IFD, > DocFieldProcessorPerThread, etc. In the NRT case, this is an FSD; in > the non-NRT case it's the Dir that was passed in (unless, in a future > issue, we explore using FSD, too, for better performance). Pass in FSD in the constructor of DocumentsWriter (and others) as before? {quote} Right. All these places could care less if they are dealing w/ FSD or a "real" dir. They should simply use the Directory API as they previously did. {quote} > I still don't think we need a separate RAMMergeScheduler; I > think CMS should simply always run such merges (ie not block on max > thread count). IW.getNextMerge can then revert to its former > self. Where does the thread come from for this if we're using max threads? If we allocate one, we're over limit and keeping it around. We'd need a more advanced threadpool that elastically grows the thread pool and kills threads that are unused over time. With Java 1.5 we can use ThreadPoolExecutor. Is a dedicated thread pool something we want to go to? Even then we can potentially still max out a given thread pool with requests to merge one directory or the other. We'd probably still need two separate thread pools. {quote} The thread is simply launched w/o checking maxThreadCount, if the merge is in RAM. Right, with JDK 1.5 we can make CMS better about pooling threads. Right now it does no long-term pooling (unless another merge happens to be needed when a thread finishes its last merge). {quote} > MergePolicy.OneMerge.segString no longer needs to take a > Directory (because it now stores a Directory). Yeah, I noticed this, I'll change it. MergeSpecification.segString is public and takes a directory that is not required. What to do? {quote} Do the usual back-compat dance -- deprecate it and add the new one. {quote} > The dual directories is continuing to push deeper (when I'm > wanting to do the reverse). EG, MergeScheduler.getDestinationDirs > should not be needed? If we remove getFlushDirectory, are you saying getDirectory should return the FSD if RAM NRT is turned on? This seems counter intuitive in that we still need a clear separation of the two directories? The user would expect the directory they passed into the ctor to be returned? {quote} I agree, we should leave getDirectory() as is (returns whatever Dir was passed in). We can keep getFlushDirectory, but it should not have duality inside it -- it should simply return the FSD (in the NRT case) or the normal dir. I don't really like the name getFlushDirectory... but can't think of a better one yet. Then, nothing outside of IW should ever know there are two directories at play. They all simply deal with the one and only Directory that IW hands out. On the "when to flush to RAM" question... I agree it's tricky. This logic belongs in the RAMMergePolicy. That policy needs to be empowered to decide if a new flush goes to RAM or disk, to decide when to merge all RAM segments to a new disk segment, to be able to check if IW is in NRT mode, etc. Probably the RAM merge policy also needs control over how much of the RAM buffer it's going to give to DW, too. At first the policy should not change the non-NRT case (ie one always flushes straight to disk). We can play w/ that in a separate issue. Need to think more about the logic... > Realtime Search > --------------- > > Key: LUCENE-1313 > URL: https://issues.apache.org/jira/browse/LUCENE-1313 > Project: Lucene - Java > Issue Type: New Feature > Components: Index > Affects Versions: 2.4.1 > Reporter: Jason Rutherglen > Priority: Minor > Fix For: 2.9 > > Attachments: LUCENE-1313.jar, LUCENE-1313.patch, LUCENE-1313.patch, > LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, > LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, lucene-1313.patch, > lucene-1313.patch, lucene-1313.patch, lucene-1313.patch > > > Realtime search with transactional semantics. > Possible future directions: > * Optimistic concurrency > * Replication > Encoding each transaction into a set of bytes by writing to a RAMDirectory > enables replication. It is difficult to replicate using other methods > because while the document may easily be serialized, the analyzer cannot. > I think this issue can hold realtime benchmarks which include indexing and > searching concurrently. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org