Re: Checkpoint file not forced to disk

Chen Luo Mon, 28 Oct 2019 09:32:17 -0700

Hi Murtadha,

Feel free to abandon my submitted patch. PS - the current transaction log
and LSM disk components also use FileChannel.force() to force data to disk.


Best regards,
Chen Luo

On Mon, Oct 28, 2019 at 6:25 AM Murtadha Hubail <[email protected]> wrote:

> P.S.
> Looks like FileChannel.force() is enough according to [1] and we don't
> need FileDescriptor.sync().
>
> [1]
> https://stackoverflow.com/questions/5650327/are-filechannel-force-and-filedescriptor-sync-both-needed
>
>
> On 10/28/2019, 3:22 PM, "Murtadha Hubail" <[email protected]> wrote:
>
>     @Mike Carey,
>
>     For this specific system checkpoint file, we have a test that ensures
> even if the file is corrupted, the node will continue its startup and we
> will just force recovery to start from the beginning of the transaction log.
>
>     @Chen Luo,
>
>     Thanks for checking and highlighting the issue. Looks like forcing a
> file on closing its stream is OS dependent. Apparently, we also need
> FileDescriptor.sync() for non-local storage to guarantee a file is forced.
> We have other metadata files that we need to do this for. If you don't
> mind, I will submit a patch to cover all of these and I will include the
> changes from your patch [1].
>
>     Cheers,
>     Murtadha
>
>     [1] https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/3923
>
>     On 10/28/2019, 9:56 AM, "Mike Carey" <[email protected]> wrote:
>
>         Sounds right ... Explicit call needed!  Better crash testing also
> needed;
>         this one should not be impossible to test for.  Do we have any
> crash
>         testing that we can extend?
>
>         On Sun, Oct 27, 2019, 7:15 PM Chen Luo <[email protected]> wrote:
>
>         > HI Murtadha,
>         >
>         > I don't think closing a file will automatically force it to
> disk. It only
>         > forces the buffer from JVM to OS. I did some search online and
> also checked
>         > the source code of JDK 8. When a file channel is closed, only
> the close
>         > function (provided by Linux) is called [1]. The only place that
> calls fsync
>         > is inside the FileChannel.force() method [2]. For correctness, I
> believe
>         > force should be called explicitly, as we did when an LSM merge is
>         > completed.
>         >
>         > [1]
>         >
>         >
> https://github.com/frohoff/jdk8u-jdk/blob/da0da73ab82ed714dc5be94acd2f0d00fbdfe2e9/src/solaris/native/sun/nio/ch/FileDispatcherImpl.c#L273
>         > [2]
>         >
>         >
> https://github.com/frohoff/jdk8u-jdk/blob/da0da73ab82ed714dc5be94acd2f0d00fbdfe2e9/src/solaris/native/sun/nio/ch/FileDispatcherImpl.c#L145
>         >
>         > Best regards,
>         > Chen Luo
>         >
>         > On Sun, Oct 27, 2019 at 5:24 PM Murtadha Hubail <
> [email protected]>
>         > wrote:
>         >
>         > > Hi Chen,
>         > >
>         > > If I'm not mistaken, Files#write ensures that the file is
> closed after
>         > > writing the bytes which should flush the file. If not, then we
> probably
>         > > should add an explicit flush there.
>         > >
>         > > Cheers,
>         > > Murtadha
>         > >
>         > > On 10/27/2019, 8:26 PM, "Chen Luo" <[email protected]> wrote:
>         > >
>         > >     Hi devs,
>         > >
>         > >     I noticed the checkpoint file is not forced to disk after
> completion,
>         > > but
>         > >     we still proceed to truncate logs and older checkpoint
> files [1].
>         > This
>         > >     seems to be a bug to me. Also, from my understanding,
> reading the
>         > >     checkpoint file without forcing to disk will still succeed
> because
>         > the
>         > > file
>         > >     can be read from the OS write cache. Is there any other
>         > considerations
>         > > for
>         > >     not forcing checkpoint files?
>         > >
>         > >
>         > >     [1]
>         > >
>         > >
>         >
> https://github.com/apache/asterixdb/blob/2a76a0fe83fc5534c00923cd0f09f8477eac713a/asterixdb/asterix-transactions/src/main/java/org/apache/asterix/transaction/management/service/recovery/AbstractCheckpointManager.java#L176
>         > >
>         > >
>         > >
>         > >
>         >
>
>
>
>
>

Re: Checkpoint file not forced to disk

Reply via email to