[
https://issues.apache.org/jira/browse/LUCENE-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662339#action_12662339
]
Michael McCandless commented on LUCENE-1476:
--------------------------------------------
{quote}
> Mmm. I think I might have given IndexWriter.commit() slightly different
> semantics. Specifically, I might have given it a boolean "sync" argument
> which defaults to false.
{quote}
It seems like there are various levels of increasing "durability" here:
* Make available to a reader in the same JVM (eg flush new segment
to a RAMDir) -- not exposed today.
* Make available to a reader sharing the filesystem right now (flush
to Directory in "real" filesystem, but don't sync) -- exposed
today (but deprecated as a public API) as flush.
* Make available to readers even if OS/machine crashes (flush to
Directory in "real" filesystem, and sync) -- exposed today as commit.
{quote}
> Two comments. First, if you don't sync, but rather leave it up to the OS when
> it wants to actually perform the actual disk i/o, how expensive is flushing?
> Can
> we make it cheap enough to meet Jason's absolute change rate requirements?
{quote}
Right I've been wondering the same thing. I think this should be our
first approach to realtime indexing, and then we swap in RAMDir if
performance is not good enough.
{quote}
> Second, the multi-index model is very tricky when dealing with "updates". How
> do you guarantee that you always see the "current" version of a given
> document, and only that version? When do you expose new deletes in the
> RAMDirectory, when do you expose new deletes in the FSDirectory, how do you
> manage slow merges from the RAMDirectory to the FSDirectory, how do you manage
> new adds to the RAMDirectory that take place during slow merges...
>
> Building a single-index, two-writer model that could handle fast updates while
> performing background merging was one of the main drivers behind the tombstone
> design.
{quote}
I'm not proposing multi-index model (at least I think I'm not!). A
single IW could flush new tiny segments into a RAMDir and later merge
them into a real Dir. But I agree: let's start w/ a single Dir and
move to RAMDir only if necessary.
{quote}
> Building a single-index, two-writer model that could handle fast updates while
> performing background merging was one of the main drivers behind the tombstone
> design.
{quote}
I think carrying the deletions in RAM (reopening the reader) is
probably fastest for Lucene. Lucene with the "reopened stream of
readers" approach can do this, but Lucy/KS (with mmap) must use
filesystem as the intermediary.
> BitVector implement DocIdSet
> ----------------------------
>
> Key: LUCENE-1476
> URL: https://issues.apache.org/jira/browse/LUCENE-1476
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
> Affects Versions: 2.4
> Reporter: Jason Rutherglen
> Priority: Trivial
> Attachments: LUCENE-1476.patch, quasi_iterator_deletions.diff
>
> Original Estimate: 12h
> Remaining Estimate: 12h
>
> BitVector can implement DocIdSet. This is for making
> SegmentReader.deletedDocs pluggable.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]