Hi Murtadha, Feel free to abandon my submitted patch. PS - the current transaction log and LSM disk components also use FileChannel.force() to force data to disk.
Best regards, Chen Luo On Mon, Oct 28, 2019 at 6:25 AM Murtadha Hubail <[email protected]> wrote: > P.S. > Looks like FileChannel.force() is enough according to [1] and we don't > need FileDescriptor.sync(). > > [1] > https://stackoverflow.com/questions/5650327/are-filechannel-force-and-filedescriptor-sync-both-needed > > > On 10/28/2019, 3:22 PM, "Murtadha Hubail" <[email protected]> wrote: > > @Mike Carey, > > For this specific system checkpoint file, we have a test that ensures > even if the file is corrupted, the node will continue its startup and we > will just force recovery to start from the beginning of the transaction log. > > @Chen Luo, > > Thanks for checking and highlighting the issue. Looks like forcing a > file on closing its stream is OS dependent. Apparently, we also need > FileDescriptor.sync() for non-local storage to guarantee a file is forced. > We have other metadata files that we need to do this for. If you don't > mind, I will submit a patch to cover all of these and I will include the > changes from your patch [1]. > > Cheers, > Murtadha > > [1] https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/3923 > > On 10/28/2019, 9:56 AM, "Mike Carey" <[email protected]> wrote: > > Sounds right ... Explicit call needed! Better crash testing also > needed; > this one should not be impossible to test for. Do we have any > crash > testing that we can extend? > > On Sun, Oct 27, 2019, 7:15 PM Chen Luo <[email protected]> wrote: > > > HI Murtadha, > > > > I don't think closing a file will automatically force it to > disk. It only > > forces the buffer from JVM to OS. I did some search online and > also checked > > the source code of JDK 8. When a file channel is closed, only > the close > > function (provided by Linux) is called [1]. The only place that > calls fsync > > is inside the FileChannel.force() method [2]. For correctness, I > believe > > force should be called explicitly, as we did when an LSM merge is > > completed. > > > > [1] > > > > > https://github.com/frohoff/jdk8u-jdk/blob/da0da73ab82ed714dc5be94acd2f0d00fbdfe2e9/src/solaris/native/sun/nio/ch/FileDispatcherImpl.c#L273 > > [2] > > > > > https://github.com/frohoff/jdk8u-jdk/blob/da0da73ab82ed714dc5be94acd2f0d00fbdfe2e9/src/solaris/native/sun/nio/ch/FileDispatcherImpl.c#L145 > > > > Best regards, > > Chen Luo > > > > On Sun, Oct 27, 2019 at 5:24 PM Murtadha Hubail < > [email protected]> > > wrote: > > > > > Hi Chen, > > > > > > If I'm not mistaken, Files#write ensures that the file is > closed after > > > writing the bytes which should flush the file. If not, then we > probably > > > should add an explicit flush there. > > > > > > Cheers, > > > Murtadha > > > > > > On 10/27/2019, 8:26 PM, "Chen Luo" <[email protected]> wrote: > > > > > > Hi devs, > > > > > > I noticed the checkpoint file is not forced to disk after > completion, > > > but > > > we still proceed to truncate logs and older checkpoint > files [1]. > > This > > > seems to be a bug to me. Also, from my understanding, > reading the > > > checkpoint file without forcing to disk will still succeed > because > > the > > > file > > > can be read from the OS write cache. Is there any other > > considerations > > > for > > > not forcing checkpoint files? > > > > > > > > > [1] > > > > > > > > > https://github.com/apache/asterixdb/blob/2a76a0fe83fc5534c00923cd0f09f8477eac713a/asterixdb/asterix-transactions/src/main/java/org/apache/asterix/transaction/management/service/recovery/AbstractCheckpointManager.java#L176 > > > > > > > > > > > > > > > > > > >
