I have a locking/fsync question..

I have an app that keeps job metadata in an XML file that resides on a Lustre filesystem (I actually just discovered my running system has it on NFS, but I'm seeing an anomaly on Lustre so I'll keep writing).

It uses libxml to read and write the file, and thus has to read the file into memory, make changes, and write back out.

The approach I'm taking to this is:

        open file => fd
        lock fd (using fcntl F_SETLKW)
        read from fd
        ftruncate fd
        <make modifications>
        fsync fd
        unlock fd
        close fd

The lustre system is a 4-OSS system and I'm running the test across 12 compute nodes, all of which have the fs mounted with the flock option (it falls over immediately without flock). I'm at lustre 1.6.6.

What I'm seeing is that, occasionally, the file reads will pick up an empty or partial file. This doesn't seem like it should be the case, but I'm sure I'm missing something. I don't see any errors showing up on the MDS.

thanks,
--bob
_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Reply via email to