Hi,

Curt wrote:
> What does the following mean, then, in that light:
>   Because of delayed allocation and other performance optimizations, ext4's
>   behavior of writing files to disk is different from ext3. In ext4, when a
>   program writes to the file system, it is not guaranteed to be on-disk unless
>   the program issues an fsync() call afterwards.
> https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/storage_administration_guide/ch-ext4

Well, the wording "delayed allocation and other performance optimizations"
could mean a lot of weird things.
But the subsequent praragraphs clearly concern the RAM-to-disk migration of
memory pages which are associated to the filesystem's disk storage.


>  And the delayed allocation will actually commit the data to disk only after
>  30-150 seconds (it is not very clear on this exact window of data loss)

I understand this and above that the data content already exists in RAM
(i.e. the written inode data with the birth timestamp, plus the file content
as written by the running script) but gets onto the physical storage medium
only later.
The motivation for these discussions is possible data loss if the system
suddenly stops working. Inconsistent filesystem behavior is not mentioned
(but to be expected when the next run of the system encounters the dirty
filesystem).


An explanation of the observed problem would need:
- a mechanism which delayed the content production of the inode while it
  was already in use for open and write,
- or a mechanism which caused ext4 to hide the inode to other processes
  and to write a wrong birth timestamp,
- or a mechanism which deleted the file shortly after it was created and
  re-created it 30 seconds later with its full expected content,
- or something of which i cannot think yet.

(The birth timestamp happens to match roughly the time when the file finally
became visible to other processes. It matches the modification time of the
finally visible file.)

--------------------------------------------------------------------------

The strange strace report around the time when the file finally appeared

  openat(AT_FDCWD, "….out", O_WRONLY|O_CREAT|O_APPEND, 0666 <unfinished ...>
  <... openat resumed>)  = 3

does not mean any disturbance, but only that strace had to deal with more
than one thread or process at the time. Between above two lines there is
supposed to have been another line with a system call, though.
man 1 strace:

   If  a  system call is being executed and meanwhile another one is being
   called from a different thread/process then strace will try to preserve
   the  order  of  those  events and mark the ongoing call as being unfin‐
   ished.  When the call returns it will be marked as resumed.

   [pid 28772] select(4, [3], NULL, NULL, NULL <unfinished ...>
   [pid 28779] clock_gettime(CLOCK_REALTIME, {1130322148, 939977000}) = 0
   [pid 28772] <... select resumed> )      = 1 (in [3])


Have a nice day :)

Thomas

Reply via email to