[ 
https://issues.apache.org/jira/browse/LUCENE-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789714#action_12789714
 ] 

Michael McCandless commented on LUCENE-2026:
--------------------------------------------

{quote}
> I say it's better to sacrifice write guarantee.

I don't grok why sync is the default, especially given how sketchy hardware 
drivers are about obeying fsync:

{panel}
But, beware: some hardware devices may in fact cache writes even during 
fsync, and return before the bits are actually on stable storage, to give the 
appearance of faster performance.
{panel}
{quote}

It's unclear how often this scare-warning is true in practice (scare
warnings tend to spread very easily without concrete data); it's in
the javadocs for completeness sake.  I expect (though have no data to
back this up...) that most OS/IO systems "out there" do properly
implement fsync.

{quote}
IMO, it should have been an option which defaults to false, to be enabled only 
by 
users who have the expertise to ensure that fsync() is actually doing what 
it advertises. But what's done is done (and Lucy will probably just do 
something 
different.)
{quote}

I think that's a poor default (trades safety for performance), unless
Lucy eg uses a transaction log so you can concretely bound what's lost
on crash/power loss.  Or, if you go back to autocommitting I guess...

If we did this in Lucene, you can have unbounded corruption.  It's not
just the last few minutes of updates...

So, I don't think we should even offer the option to turn it off.  You
can easily subclass your FSDir impl and make sync() a no-op if your
really want to...

{quote}
With regard to Lucene NRT, though, turning sync() off would really help. If and 
when some sort of settings class comes about, an enableSync(boolean enabled) 
method seems like it would come in handy.
{quote}

You don't need to turn off sync for NRT -- that's the whole point.  It
gives you a reader without syncing the files.  Really, this is your
safety tradeoff -- it means you can commit less frequently, since the
NRT reader can search the latest updates.  But, your app has
complete control over how it wants to to trade safety for performance.


> Refactoring of IndexWriter
> --------------------------
>
>                 Key: LUCENE-2026
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2026
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael Busch
>            Assignee: Michael Busch
>            Priority: Minor
>             Fix For: 3.1
>
>
> I've been thinking for a while about refactoring the IndexWriter into
> two main components.
> One could be called a SegmentWriter and as the
> name says its job would be to write one particular index segment. The
> default one just as today will provide methods to add documents and
> flushes when its buffer is full.
> Other SegmentWriter implementations would do things like e.g. appending or
> copying external segments [what addIndexes*() currently does].
> The second component's job would it be to manage writing the segments
> file and merging/deleting segments. It would know about
> DeletionPolicy, MergePolicy and MergeScheduler. Ideally it would
> provide hooks that allow users to manage external data structures and
> keep them in sync with Lucene's data during segment merges.
> API wise there are things we have to figure out, such as where the
> updateDocument() method would fit in, because its deletion part
> affects all segments, whereas the new document is only being added to
> the new segment.
> Of course these should be lower level APIs for things like parallel
> indexing and related use cases. That's why we should still provide
> easy to use APIs like today for people who don't need to care about
> per-segment ops during indexing. So the current IndexWriter could
> probably keeps most of its APIs and delegate to the new classes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to