[jira] Commented: (LUCENE-1476) BitVector implement DocIdSet

Michael McCandless (JIRA) Fri, 09 Jan 2009 04:00:24 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662339#action_12662339
 ]


Michael McCandless commented on LUCENE-1476:
--------------------------------------------

{quote}
> Mmm. I think I might have given IndexWriter.commit() slightly different
> semantics. Specifically, I might have given it a boolean "sync" argument
> which defaults to false.
{quote}

It seems like there are various levels of increasing "durability" here:

  * Make available to a reader in the same JVM (eg flush new segment
    to a RAMDir) -- not exposed today.

  * Make available to a reader sharing the filesystem right now (flush
    to Directory in "real" filesystem, but don't sync) -- exposed
    today (but deprecated as a public API) as flush.

  * Make available to readers even if OS/machine crashes (flush to
    Directory in "real" filesystem, and sync) -- exposed today as commit.

{quote}
> Two comments. First, if you don't sync, but rather leave it up to the OS when
> it wants to actually perform the actual disk i/o, how expensive is flushing? 
> Can
> we make it cheap enough to meet Jason's absolute change rate requirements?
{quote}

Right I've been wondering the same thing.  I think this should be our
first approach to realtime indexing, and then we swap in RAMDir if
performance is not good enough.

{quote}
> Second, the multi-index model is very tricky when dealing with "updates". How
> do you guarantee that you always see the "current" version of a given
> document, and only that version? When do you expose new deletes in the
> RAMDirectory, when do you expose new deletes in the FSDirectory, how do you
> manage slow merges from the RAMDirectory to the FSDirectory, how do you manage
> new adds to the RAMDirectory that take place during slow merges...
>
> Building a single-index, two-writer model that could handle fast updates while
> performing background merging was one of the main drivers behind the tombstone
> design.
{quote}

I'm not proposing multi-index model (at least I think I'm not!).  A
single IW could flush new tiny segments into a RAMDir and later merge
them into a real Dir.  But I agree: let's start w/ a single Dir and
move to RAMDir only if necessary.

{quote}
> Building a single-index, two-writer model that could handle fast updates while
> performing background merging was one of the main drivers behind the tombstone
> design.
{quote}

I think carrying the deletions in RAM (reopening the reader) is
probably fastest for Lucene.  Lucene with the "reopened stream of
readers" approach can do this, but Lucy/KS (with mmap) must use
filesystem as the intermediary.


> BitVector implement DocIdSet
> ----------------------------
>
>                 Key: LUCENE-1476
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1476
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Trivial
>         Attachments: LUCENE-1476.patch, quasi_iterator_deletions.diff
>
>   Original Estimate: 12h
>  Remaining Estimate: 12h
>
> BitVector can implement DocIdSet.  This is for making 
> SegmentReader.deletedDocs pluggable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-1476) BitVector implement DocIdSet

Reply via email to