br...@interlinx.bc.ca wrote on Mon, 05 Nov 2012 10:25 -0500:
> On 12-11-04 05:10 PM, Pete Wyckoff wrote:
> > Which network filesystem and OS are you using?
> The filesystem is Lustre. So not only is it networked, it is
> distributed where the namespace and data store are handled by different
> nodes, to it's not at all as atomic as NFS-on-(say-)ext4. Given that,
> it's entirely possible to imagine a scenario where a namespace (MDT in
> the Lustre nomenclature) operation could get interrupted after the
> namespace entry has been created but before the open(2) completes. So
> the question here is who's responsibility is it to handle that situation?
That's all in the filesystem. Hopefully it doesn't really work
like that because the fs is incosistent at this point.
ERESTARTSYS handling is done entirely in the kernel, not in glibc
and not git. A possible in-kernel fix is not to handle any
signals (except KILL) when waiting for the open mechanics to
> > The third option is
> > that there is a bug in the filesystem client.
> Yep. But before we can go on to determining a bug, the proper/expected
> behavior needs to be determined. I guess that's taking this a bit OT
> for this list though. I'm not really sure where else to go to determine
> this though. :-(
You could toss this to lustre support. Or try first to come up
with a reduced testcase with lots of opens and SIGALRMs racing
against each other. Maybe xfstests or some other suite might
also tickle the bug.
I don't think it is feasible to try to handle this error
condition in applications.
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html