I might be misunderstanding 1044. There were several approaches, and
I am not certain what was the final???
I reread the bug and am still a bit unclear.
If the segments are sync'd as part of the commit, then yes, that
would suffice. The merges don't need to commit, you just can't delete
the segments until the merge completes.
I think that building the segments, and syncing each segment - since
in most cases the caller is going to call commit as part of each
update, is going to be slower than writing the documents/operations
to a log file, but a lot depends on how Lucene is used (interactive
vs. batch, lots of updates vs. a few).
I am not sure how deletions are impacted by all of this.
On Feb 7, 2008, at 9:21 AM, robert engels wrote:
This is simply not true. Two different issues are at play. You
cannot have a true 'commit' unless it is synchronous!
Lucene-1044 might allow the index to be brought back to a
consistent state, but not one that is consistent with a
synchronization point.
For example, I write three documents to the index. I call commit.
It returns. After this, those documents MUST be in the index under
any conditions. Lucene 1044 does not ensure this.
By writing the operations (deletes and updates) to a log file
first, and syncing the log file, then a failure during the index
writing/merging can be fixed by rolling forward the log.
On Feb 7, 2008, at 4:29 AM, Michael McCandless wrote:
In fact this is exactly the approach in the final patch on
LUCENE-1044 and it gives far better performance than the simply
synchronous (original) approach of syncing every segment file on
close.
Using a transaction log would also require periodic syncing.
LUCENE-1044 syncs files after every merge, in the background
thread of ConcurrentMergeScheduler, which is nice because it does
not block further add/update/deleteDocument calls on the writer.
Mike
Andrew Zhang wrote:
On Feb 7, 2008 7:22 AM, robert engels <[EMAIL PROTECTED]> wrote:
That doesn't help, with lazy writing/buffering by the OS, there
is no
guarantee that if the last written block is ok, that earlier blocks
in the file are....
The OS/drive is going to physically write them in the most
efficient
manner. Only after a sync would this hold true (which is what we
are
trying to avoid).
Hi, how about asynchronous commit? i.e. use a thread to sync the
data.
We only need to make sure that all data are written to the
storage before
the next operation?
On Feb 6, 2008, at 5:15 PM, DM Smith wrote:
On Feb 6, 2008, at 5:42 PM, Michael McCandless wrote:
robert engels wrote:
Do we have any way of determining if a segment is definitely OK/
VALID ?
The only way I know is the CheckIndex tool, and it's rather
slow (and
it's not clear that it always catches all corruption).
Just a thought. It seems that the discussion has revolved around
whether a crash or similar event has left the file in an
inconsistent state. Without looking into how it is actually done,
I'm going to guess that the writing is done from the start of the
file to its end. That is, no "out of order" writing.
If this is the case, how about adding a marker to the end of the
file of a known size and pattern. If it is present then it is
presumed that there were no errors in getting to that point.
Even with out of order writing, one could write an 'INVALID'
marker
at the beginning of the operation and then upon reaching the
end of
the writing, replace it with the valid marker.
If neither marker is found then the index is one from before the
capability was added and nothing can be said about the validity.
-- DM
------------------------------------------------------------------
---
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
-------------------------------------------------------------------
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
--
Best regards,
Andrew Zhang
db4o - database for Android: www.db4o.com
http://zhanghuangzhu.blogspot.com/
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]