[jira] [Commented] (LUCENE-3237) FSDirectory.fsync() may not work properly

Michael McCandless (JIRA) Thu, 10 Apr 2014 03:47:35 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-3237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965214#comment-13965214
 ]


Michael McCandless commented on LUCENE-3237:
--------------------------------------------

Thanks Simon.

bq. Hey mike, thanks for reopening this. 

I actually didn't reopen yet ... because I do think this really is
paranoia.  The OS man pages make the semantics clear, and what we are
doing today (reopen the file for syncing) is correct.

bq. I like the fact that we get rid of the general unsynced files stuff in 
Directory.
bq. given the last point we move it in the right place inside IW that is where 
it should be

Yeah I really like that.

But, we could do that separately, i.e. add private tracking inside IW
of which newly written file names haven't been sync'd.

bq. the problem that the current patch has is that is holds on to the buffers 
in BufferedIndexOutput. I think we need to work around this here are a couple 
of ideas:
bq. introduce a SyncHandle class that we can pull from IndexOutput that allows 
to close the IndexOutput but lets you fsync after the fact

I think that's a good idea.  For FSDir impls this is just a thin
wrapper around FileDescriptor.

bq. this handle can be refcounted internally and we just decrement the count on 
IndexOutput#close() as well as on SyncHandle#close()
bq. we can just hold on to the SyncHandle until we need to sync in IW

Ref counting may be overkill?  Who else will be pulling/sharing this
sync handle?  Maybe we can add a "IndexOutput.closeToSyncHandle", the
IndexOutput flushes and is unusable from then on, but returns the sync
handle which the caller must later close.

One downside of moving to this API is ... it rules out writing some
bytes, fsyncing, writing some more, fsyncing, e.g. if we wanted to add
a transaction log impl on top of Lucene.  But I think that's OK
(design for today).  There are other limitations in IndexOuput for
xlog impl...

bq.since this will basically close the underlying FD later we might want to 
think about size-bounding the number of unsynced files and maybe let indexing 
threads fsync them concurrently? maybe something we can do later.
bq.if we know we flush for commit we can already fsync directly which might 
safe resources / time since it might be concurrent

Yeah we can pursue this in "phase 2".  The OS will generally move
dirty buffers to stable storage anyway over time, so the cost of
fsyncing files written (relatively) long ago (10s of seconds; on linux
I think the default is usually 30 seconds) will usually be low.  The
problem is on some filesystems fsync can be unexpectedly costly (there
was a "famous" case in ext3
https://bugzilla.mozilla.org/show_bug.cgi?id=421482 but this has been
fixed), so we need to be careful about this.


> FSDirectory.fsync() may not work properly
> -----------------------------------------
>
>                 Key: LUCENE-3237
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3237
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/store
>            Reporter: Shai Erera
>         Attachments: LUCENE-3237.patch
>
>
> Spinoff from LUCENE-3230. FSDirectory.fsync() opens a new RAF, sync() its 
> FileDescriptor and closes RAF. It is not clear that this syncs whatever was 
> written to the file by other FileDescriptors. It would be better if we do 
> this operation on the actual RAF/FileOS which wrote the data. We can add 
> sync() to IndexOutput and FSIndexOutput will do that.
> Directory-wise, we should stop syncing on file names, and instead sync on the 
> IOs that performed the write operations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-3237) FSDirectory.fsync() may not work properly

Reply via email to