[ https://issues.apache.org/jira/browse/DERBY-7034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16854289#comment-16854289 ]
David Sitsky commented on DERBY-7034: ------------------------------------- [~bryanpendleton], [~rhillegas] - any advice from my last comment? > Derby's sync() handling can lead to database corruption (at least on Linux) > --------------------------------------------------------------------------- > > Key: DERBY-7034 > URL: https://issues.apache.org/jira/browse/DERBY-7034 > Project: Derby > Issue Type: Bug > Components: Store > Affects Versions: 10.14.2.0 > Reporter: David Sitsky > Priority: Major > > I recently read about "fsyncgate 2018" that the Postgres team raised: > https://wiki.postgresql.org/wiki/Fsync_Errors. > https://lwn.net/Articles/752063/ has a good overview of the issue relating to > fsync() behaviour on Linux. The short summary is on some versions of Linux > if you retry fsync() after it failed, it will succeed and you will end up > with corrupted data on disk. > At a quick glance at the Derby code, I have already seen two places where > sync() is retried in a loop which is clearly dangerous. There could be other > areas too. > In LogAccessFile: > {code} > /** > * Guarantee all writes up to the last call to flushLogAccessFile on disk. > * <p> > * A call for clients of LogAccessFile to insure that all data written > * up to the last call to flushLogAccessFile() are written to disk. > * This call will not return until those writes have hit disk. > * <p> > * Note that this routine may block waiting for I/O to complete so > * callers should limit the number of resource held locked while this > * operation is called. It is expected that the caller > * Note that this routine only "writes" the data to the file, this does > not > * mean that the data has been synced to disk. The only way to insure > that > * is to first call switchLogBuffer() and then follow by a call of sync(). > * > **/ > public void syncLogAccessFile() > throws IOException, StandardException > { > for( int i=0; ; ) > { > // 3311: JVM sync call sometimes fails under high load against > NFS > // mounted disk. We re-try to do this 20 times. > try > { > synchronized( this) > { > log.sync(); > } > // the sync succeed, so return > break; > } > catch( SyncFailedException sfe ) > { > i++; > try > { > // wait for .2 of a second, hopefully I/O is done by now > // we wait a max of 4 seconds before we give up > Thread.sleep( 200 ); > } > catch( InterruptedException ie ) > { > InterruptStatus.setInterrupted(); > } > if( i > 20 ) > throw StandardException.newException( > SQLState.LOG_FULL, sfe); > } > } > } > {code} > And LogToFile has similar retry code.. but without handling for > SyncFailedException: > {code} > /** > * Utility routine to call sync() on the input file descriptor. > * <p> > */ > private void syncFile( StorageRandomAccessFile raf) > throws StandardException > { > for( int i=0; ; ) > { > // 3311: JVM sync call sometimes fails under high load against > NFS > // mounted disk. We re-try to do this 20 times. > try > { > raf.sync(); > // the sync succeed, so return > break; > } > catch (IOException ioe) > { > i++; > try > { > // wait for .2 of a second, hopefully I/O is done by now > // we wait a max of 4 seconds before we give up > Thread.sleep(200); > } > catch( InterruptedException ie ) > { > InterruptStatus.setInterrupted(); > } > if( i > 20 ) > { > throw StandardException.newException( > SQLState.LOG_FULL, ioe); > } > } > } > } > {code} > It seems Postgres, MySQL and MongoDB have already changed their code to > "panic" if an error comes from an fsync() call. > There is a lot more complexities with how fsync() reports errors (if at all). > It is worth getting into it further as I am not familiar with Derby's > internals and how affected it could be by this. > Interestingly people have indicated this issue is more likely to happen for > network filesystems (since write failures are more common due to the network > going down) and in the past it was easy just to say "NFS is broken".. but in > actual fact the problem was in some cases with fsync() and how it was called > in a loop. > I've been trying to find out if Windows has similar issues without much luck. > But given the mysterious corruption issues I have seen on the past with > Windows/CIFS.. I do wonder if this is related somehow. -- This message was sent by Atlassian JIRA (v7.6.3#76005)