A suggestion:

Try just direct IO.  Group locks are complicated and have challenging 
semantics, including the unavoidable possibility of stale file size and 
obviously requiring you to be very careful in your application.  In recent 
versions of Lustre, using direct IO avoids taking locks on the client entirely 
- instead, it takes locks purely on the server side and only for the written 
extent and length of the IO itself.  This avoids all of the problems with 
shared file lock contention for non-overlapping writes, and still gives 100% 
expected semantics for overlapping reads and writes.

So, if you are able to switch to direct IO as you mention, group locks should 
be unnecessary and are better avoided.  Direct IO works like this in 2.15 and 
newer.  (Also, in 2.17, hybrid IO can do this switch for you automatically for 
larger IO sizes.)

Patrick
________________________________
From: lustre-discuss <[email protected]> on behalf of 
Oleg Drokin via lustre-discuss <[email protected]>
Sent: Saturday, February 7, 2026 12:49 AM
To: [email protected] <[email protected]>; 
[email protected] <[email protected]>
Subject: Re: [lustre-discuss] Group Lock Semantics

Hello!

On Fri, 2026-02-06 at 18:09 -0800, Freddie Witherden via lustre-discuss
wrote:

> So, we reworked our code slightly to ensure that each page is only
> ever
> written to by a single rank.   However, even here we find data to
> occasionally be missing from the file with the offsets corresponding
> to
> boundaries between hosts.  We have even tried increasing the size up
> to
> the stripe size for the file (so each N MiB stripe is only ever
> written
> to by a single rank) but to no avail.
>
> Hence, I am wondering what the specific semantics are for writes
> under a
> group lock?  Do we have to use O_DIRECT and bypass the page cache,
> are
> there more significant alignment requirements than pages?

I think O_DIRECT was the primary idea for using as otherwise same host
mixed io might be confused about which pages are covered by what locks,
but in general it's still supposed to work without any particular
alignment requirements.

Do you happen to have a simplistic test case demonstrating the problem
by any chance?

Bye,
   Oleg
_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to