[ https://issues.apache.org/jira/browse/LUCENE-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662339#action_12662339 ]
Michael McCandless commented on LUCENE-1476: -------------------------------------------- {quote} > Mmm. I think I might have given IndexWriter.commit() slightly different > semantics. Specifically, I might have given it a boolean "sync" argument > which defaults to false. {quote} It seems like there are various levels of increasing "durability" here: * Make available to a reader in the same JVM (eg flush new segment to a RAMDir) -- not exposed today. * Make available to a reader sharing the filesystem right now (flush to Directory in "real" filesystem, but don't sync) -- exposed today (but deprecated as a public API) as flush. * Make available to readers even if OS/machine crashes (flush to Directory in "real" filesystem, and sync) -- exposed today as commit. {quote} > Two comments. First, if you don't sync, but rather leave it up to the OS when > it wants to actually perform the actual disk i/o, how expensive is flushing? > Can > we make it cheap enough to meet Jason's absolute change rate requirements? {quote} Right I've been wondering the same thing. I think this should be our first approach to realtime indexing, and then we swap in RAMDir if performance is not good enough. {quote} > Second, the multi-index model is very tricky when dealing with "updates". How > do you guarantee that you always see the "current" version of a given > document, and only that version? When do you expose new deletes in the > RAMDirectory, when do you expose new deletes in the FSDirectory, how do you > manage slow merges from the RAMDirectory to the FSDirectory, how do you manage > new adds to the RAMDirectory that take place during slow merges... > > Building a single-index, two-writer model that could handle fast updates while > performing background merging was one of the main drivers behind the tombstone > design. {quote} I'm not proposing multi-index model (at least I think I'm not!). A single IW could flush new tiny segments into a RAMDir and later merge them into a real Dir. But I agree: let's start w/ a single Dir and move to RAMDir only if necessary. {quote} > Building a single-index, two-writer model that could handle fast updates while > performing background merging was one of the main drivers behind the tombstone > design. {quote} I think carrying the deletions in RAM (reopening the reader) is probably fastest for Lucene. Lucene with the "reopened stream of readers" approach can do this, but Lucy/KS (with mmap) must use filesystem as the intermediary. > BitVector implement DocIdSet > ---------------------------- > > Key: LUCENE-1476 > URL: https://issues.apache.org/jira/browse/LUCENE-1476 > Project: Lucene - Java > Issue Type: Improvement > Components: Index > Affects Versions: 2.4 > Reporter: Jason Rutherglen > Priority: Trivial > Attachments: LUCENE-1476.patch, quasi_iterator_deletions.diff > > Original Estimate: 12h > Remaining Estimate: 12h > > BitVector can implement DocIdSet. This is for making > SegmentReader.deletedDocs pluggable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org