[ 
https://issues.apache.org/jira/browse/DERBY-7034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16854289#comment-16854289
 ] 

David Sitsky commented on DERBY-7034:
-------------------------------------

[~bryanpendleton], [~rhillegas] - any advice from my last comment?


> Derby's sync() handling can lead to database corruption (at least on Linux)
> ---------------------------------------------------------------------------
>
>                 Key: DERBY-7034
>                 URL: https://issues.apache.org/jira/browse/DERBY-7034
>             Project: Derby
>          Issue Type: Bug
>          Components: Store
>    Affects Versions: 10.14.2.0
>            Reporter: David Sitsky
>            Priority: Major
>
> I recently read about "fsyncgate 2018" that the Postgres team raised: 
> https://wiki.postgresql.org/wiki/Fsync_Errors.  
> https://lwn.net/Articles/752063/ has a good overview of the issue relating to 
> fsync() behaviour on Linux.  The short summary is on some versions of Linux 
> if you retry fsync() after it failed, it will succeed and you will end up 
> with corrupted data on disk.
> At a quick glance at the Derby code, I have already seen two places where 
> sync() is retried in a loop which is clearly dangerous.  There could be other 
> areas too.
> In LogAccessFile:
> {code}
>     /**
>      * Guarantee all writes up to the last call to flushLogAccessFile on disk.
>      * <p>
>      * A call for clients of LogAccessFile to insure that all data written
>      * up to the last call to flushLogAccessFile() are written to disk.
>      * This call will not return until those writes have hit disk.
>      * <p>
>      * Note that this routine may block waiting for I/O to complete so 
>      * callers should limit the number of resource held locked while this
>      * operation is called.  It is expected that the caller
>      * Note that this routine only "writes" the data to the file, this does 
> not
>      * mean that the data has been synced to disk.  The only way to insure 
> that
>      * is to first call switchLogBuffer() and then follow by a call of sync().
>      *
>      **/
>     public void syncLogAccessFile() 
>         throws IOException, StandardException
>     {
>         for( int i=0; ; )
>         {
>             // 3311: JVM sync call sometimes fails under high load against 
> NFS 
>             // mounted disk.  We re-try to do this 20 times.
>             try
>             {
>                 synchronized( this)
>                 {
>                     log.sync();
>                 }
>                 // the sync succeed, so return
>                 break;
>             }
>             catch( SyncFailedException sfe )
>             {
>                 i++;
>                 try
>                 {
>                     // wait for .2 of a second, hopefully I/O is done by now
>                     // we wait a max of 4 seconds before we give up
>                     Thread.sleep( 200 ); 
>                 }
>                 catch( InterruptedException ie )
>                 {
>                     InterruptStatus.setInterrupted();
>                 }
>                 if( i > 20 )
>                     throw StandardException.newException(
>                         SQLState.LOG_FULL, sfe);
>             }
>         }
>     }
> {code}
> And LogToFile has similar retry code.. but without handling for 
> SyncFailedException:
> {code}
>     /**
>      * Utility routine to call sync() on the input file descriptor.
>      * <p> 
>     */
>     private void syncFile( StorageRandomAccessFile raf) 
>         throws StandardException
>     {
>         for( int i=0; ; )
>         {
>             // 3311: JVM sync call sometimes fails under high load against 
> NFS 
>             // mounted disk.  We re-try to do this 20 times.
>             try
>             {
>                 raf.sync();
>                 // the sync succeed, so return
>                 break;
>             }
>             catch (IOException ioe)
>             {
>                 i++;
>                 try
>                 {
>                     // wait for .2 of a second, hopefully I/O is done by now
>                     // we wait a max of 4 seconds before we give up
>                     Thread.sleep(200);
>                 }
>                 catch( InterruptedException ie )
>                 {   
>                     InterruptStatus.setInterrupted();
>                 }
>                 if( i > 20 )
>                 {
>                     throw StandardException.newException(
>                                 SQLState.LOG_FULL, ioe);
>                 }
>             }
>         }
>     }
> {code}
> It seems Postgres, MySQL and MongoDB have already changed their code to 
> "panic" if an error comes from an fsync() call.
> There is a lot more complexities with how fsync() reports errors (if at all). 
>  It is worth getting into it further as I am not familiar with Derby's 
> internals and how affected it could be by this.
> Interestingly people have indicated this issue is more likely to happen for 
> network filesystems (since write failures are more common due to the network 
> going down) and in the past it was easy just to say "NFS is broken".. but in 
> actual fact the problem was in some cases with fsync() and how it was called 
> in a loop.
> I've been trying to find out if Windows has similar issues without much luck. 
>  But given the mysterious corruption issues I have seen on the past with 
> Windows/CIFS.. I do wonder if this is related somehow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to