Hi,

Vincent Lefevre wrote:
> the issue [...] is probably unlikely to occur again.

In this case we will hardly be able to find an explanation.


> However, there's also the fact that the birth time was 30 seconds
> ahead of the actual file creation, while there was no lockup.

Yes, your observations are not yet consistently explainable.
So some of the normal assumptions about your situation must be wrong.
Question is which ones.

The file times which you showed are consistent with a file that was indeed
created and written 30 seconds after your script was supposed to have
created it and to have written to it with repeated name lookups.

> > -rw-r--r--  1     878 2022-04-26 14:43:45 
> > mpfrtests.cventin.lip.ens-lyon.fr.out
> >  Birth: 2022-04-26 14:43:45.537241731 +0200

On the other hand the content looks like the normal work result of your
script which already had ended half a minute ago.

It is normal that data get onto the physical storage medium only quite a
long time after a program wrote them. But this is supposed to be kept
consistent by the VFS and virtual memory of the Linux kernel.


> The script is likely to run on
> the same CPU core, so that the file would still be visible along
> the script, possibly via a cache.

A connection to the CPU cache would be a strange low level problem of
kernel or hardware.
I understand that the filesystem driver writes to memory pages which
are associated to storage device memory. The pages and their association
are managed by the virtual memory facility of the kernel.
  
https://www.kernel.org/doc/html/latest/filesystems/vfs.html#the-address-space-object
Any attempt to access the associated to storage device memory of a not
yet written page is supposed to be directed to the cached page in RAM.

If it has indeed to do with the CPU cache then a particular cache would have
delayed its writing to RAM for 30 seconds but would have served its own CPU
with the full results of file system driver and virtual memory activities
around the new file. No inconsistent partial results would have been written
to page cache in RAM which would have caused protests during your attempts
to see the file.

But i deem it unlikely that the kernel threads which operated filesystem and
virtual memory are (nearly) always running on the same CPU cache which is
not shared with all other CPU cores.
Further, if the memory operations were just pending in some secluded cache
why does the inode then bear the time when that cache would finally have
released its content to the wider accessible RAM.


If i was in your situation, i'd add diagnostic messages to the script in the
hope (or fear) that the glitch happens again.
Especially the inode numbers during and after the script run would be
interesting.


Have a nice day :)

Thomas

Reply via email to