[ https://issues.apache.org/jira/browse/LUCENE-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12704051#action_12704051 ]
Michael McCandless commented on LUCENE-1313: -------------------------------------------- {quote} I assume it's ok for the IW.mergescheduler to be used which may not immediately perform the merge to disk (in the case of ConcurrentMergeScheduler)? {quote} Only if we "accept" requiring MergePolicy to be aware that some segments are in RAMDir and some are in the "real" Dir and to "act accordingly", ie 1) don't mix the dirs when merging, 2) when RAM is "full" merge every single RAM segment into a single "real Dir" segment (requires IW to provide exposure on how much RAM DW's buffer is currently consuming), 3) properly "maintain" the RAM segments (ie, merge RAM -> RAM somehow) so that searchers don't search too many RAM segments. I think this approach is probably best: you're right that allowing CMS to manage these RAM segments is nice since it'll happen in the BG and will not block updates. It does mean, though, that the RAM usage semantics of IW is no longer so "crisp" as flushing today ("once RAM is full, stop world & flush it to disk, then resume") but I think that's acceptable and perhaps preferable since world is no longer stopped to flush RAM -> disk. Though one trickiness is... if a large RAM -> RAM merge takes place, we temporarily double the RAM consumption. I think MergePolicy simply shouldn't do that. Ie at not point should it be merging a very large %tg of the RAM segments. It should instead merge RAM -> disk. This'd also mean advanced users that implement their own MergePolicy must realize when IW is used with NRT reader that additional smarts is recommended wrt {quote} When implementing using addIndexesNoOptimize (which blocks) I realized we probably don't want blocking to occur because that means shutting down the updates. {quote} Right, this is one of the strong reasons to do the "internal" approach vs "external" one. {quote} Also a random thought, it seems like ConcurrentMergeScheduler works great for RAMDir merging, how does it compare with SerialMS on an FSDirectory? It seems like it shouldn'y be too much faster given the IO sequential access bottleneck? {quote} By far the biggest win of CMS over SMS is in the first merge, because it does not block the further addition of docs. Thus an app can continue indexing into RAM buffer (consuming CPU & RAM resources) while a BG thread consumes RAM + IO resources. This is very much a win. Beyond the first merge...in theory, modern IO systems have concurrency (eg the NCQ in a single SATA drive) so you should "gain" by having several threads performing IO at once. The OS & hard drives attempt to re-order the request in a more optimal way (like an elevator, sweeping floors). I haven't explictly tested this with Lucene... I believe SSDs handle concurrent requests very well since under the hood most of them are multi-channel basically RAID0 devices (eg Intel X25M has 10 channels). > Realtime Search > --------------- > > Key: LUCENE-1313 > URL: https://issues.apache.org/jira/browse/LUCENE-1313 > Project: Lucene - Java > Issue Type: New Feature > Components: Index > Affects Versions: 2.4.1 > Reporter: Jason Rutherglen > Priority: Minor > Fix For: 2.9 > > Attachments: LUCENE-1313.jar, LUCENE-1313.patch, LUCENE-1313.patch, > LUCENE-1313.patch, LUCENE-1313.patch, lucene-1313.patch, lucene-1313.patch, > lucene-1313.patch, lucene-1313.patch > > > Realtime search with transactional semantics. > Possible future directions: > * Optimistic concurrency > * Replication > Encoding each transaction into a set of bytes by writing to a RAMDirectory > enables replication. It is difficult to replicate using other methods > because while the document may easily be serialized, the analyzer cannot. > I think this issue can hold realtime benchmarks which include indexing and > searching concurrently. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org