Dear All,
Along with the files that suddenly disappear, we have a bit of a problem
with these that do not.
Namely, in my test runs I can see a small but stable set of files, that
rsync is repeatedly trying to transfer, and then repeatedly fails to
updates their times ; and then the story repeats itself.
The situation is illustrated by the log snippet below, where I have
changed file names to protect owners privacy. In the tests, I am using
rsync 3.1.1 since this is what is presently supported by the OS release
maintainers, and I was trying to avoid any new issues if possible. But
in my understanding, rsync 3.1.2 would have had the same problem here.
This issue is in many ways similar to bug
https://bugzilla.samba.org/show_bug.cgi?id=4977 and the like, and in
this particular case is caused by files whose mtime nanoseconds exist on
the filesystem and are broken. However, since there is a workaround to
bypass a similar issue with symlinks, I believe we can do something
about such files as well: with just a few broken timestamps on a large
filesystem, it is very desirable to be able to avoid or suppress code 23
('partial transfer') when all the file data and 99.9% of the timestamps
were transferred properly.
After the log snippets below, I will try to explain the problem as I see
it, and then suggest two different ways to address it: basically ignore
error 22 (EINVAL) of fix nanosecond mtime .
===
...
2017/04/06 00:16:06 [16000] <f..tp..... filename1
2017/04/06 00:16:06 [16000] <f..tp..... filename2
2017/04/06 00:16:06 [16000] rsync: failed to set times on
"filename1.ezq12g" (in module): Invalid argument (22)
2017/04/06 00:16:06 [16000] rsync: failed to set times on
"filename2.wp5Aju" (in module): Invalid argument (22)
2017/04/06 00:16:09 [16000] <f..t...... filename3
2017/04/06 00:16:09 [16000] rsync: failed to set times on
"filename3.8Fa3LO" (in module): Invalid argument (22)
2017/04/06 00:16:09 [16000] .d..t...... dir1/
2017/04/06 00:16:09 [16000] <f..t...... filename4
2017/04/06 00:16:09 [16000] <f.st...... filename5
2017/04/06 00:16:09 [16000] .d..t...... dir2/
2017/04/06 00:16:09 [16000] <f..t...... filename6
2017/04/06 00:16:09 [16000] .d..t...... dir3/
2017/04/06 00:16:09 [16000] <f.st...... dir3/filename7
2017/04/06 00:16:10 [16000] sent 161,021,998 bytes received 1,858,287
bytes 868,694.85 bytes/sec
2017/04/06 00:16:10 [16000] total size is 1,228,368,779,326 speedup is
7,541.54
2017/04/06 00:16:10 [16000] rsync error: some files/attrs were not
transferred (see previous errors) (code 23) at main.c(1183) [sender=3.1.1]
===
With some extra verbosity, the logs give some insight:
===
2017/04/07 18:44:41 [27597] recv_files(filename1)
2017/04/07 18:44:41 [27597] got file_sum
2017/04/07 18:44:41 [27597] set uid of filename1.ZjS8YL from 0 to *****
2017/04/07 18:44:41 [27597] set gid of filename1.ZjS8YL from 0 to *****
2017/04/07 18:44:41 [27597] set modtime of filename1.ZjS8YL to
(370409472) Sun Sep 27 03:31:12 1981
2017/04/07 18:44:41 [27597] rsync: failed to set times on
"filename1.ZjS8YL" (in users): Invalid argument (22)
2017/04/07 18:44:41 [27597] renaming filename1.ZjS8YL to filename1
===
Both 'cp -a filename1 somewhere_else' and 'touch -r filename1
something_else' fail as well, and in a bug tracker we can stop here with
a WONTFIX.
However, copying times with stat() / utime() from filename1 to its
destination fixes all its time transfer problems once and for all, so
certainly something can be done here.
Now I will have to apologize for the length of the letter -- and start
discussing possible solutions.
(a) First of all, we can _fix_ the time on destination. ( I do not _own_
the files in question; I'm just trying to make sure the data is
replicated from one location to another -- ideally, as accurately as
possible. )
Debugging the GNU/touch utility code, I can see that for all these files
.tv_nsec field of the timespec structure gets bogus values: either
negative, or far over 1000000000 (one second), and in my opinion this
eventually causes utimensat() / futimens() to fail .
To remind, struct timespec is defined as follows:
===
struct timespec {
time_t tv_sec; /* seconds */
long tv_nsec; /* nanoseconds */
};
===
Apart from a terrible idea to define a disk-related data field being
dependent on the CPU word size ( both time_t and long have the word size
on GCC ), there is an obvious problem here: while tv_sec field, by
definition, _can_not_ have a "wrong" value (e.g. negative numbers here
may look weird, but they are not, per se, _wrong_) -- tv_nsec field
actually can have an unacceptable value.
First of all, it is not expected to be negative: imho on Linux
UTIME_OMIT is (-2), that is, we ban negative nanoseconds.
Next, although the standard never says so, it is apparently not
/expected/ for these values to be greater than one second: for 32-bit
longs, this is not a big deal, since a couple of extra seconds may just
lie within very common time jitter. For 64-bit longs, however, one could
stuff the whole Unix timestamp in that field, so we would probably want
to ban that as well for the sake of general sanity.
Now back to rsync: looking at set_modtime() definition in util.c, one
can clearly see that rsync developers take a very sane and reasonable
approach by declaring mod_nsec as an uint32 -- that is, both unsigned,
and having a byte-fixed reasonable size. ( This is imho how tv_nsec
field should have been declared! )
Furthermore, rsync 3.1.1 -- see set_file_attrs() in rsync.c -- compares
only mtime _seconds_, and takes no attempt to adjust mtimes further if
the "classic" .st_mtime values coincide. ( And this is why all the
"Invalid argument (22)" problems disappear after we transfer .st_mtime
values by an utime() call. )
Therefore, still for rsync 3.1.1 -- first, any bogus negative value for
.st_mtim.tv_nsec will be silently converted to a uint32 ; and second,
any bogus value which is far over one second will be truncated.
I'll discuss version 3.1.2 in a moment, but it is rather tempting to
agree that if we discard bogus nanosecond .st_mtim.tv_nsec values and
set them to 0, we will in fact transfer all the same data and copy the
timestamps as close as we can, and moreover -- will not attempt to
transfer these files again.
This is a proposed patch to do the above:
https://gist.github.com/anonymous/1eb789c825a8eaa1bdb4116334ba52da .
Version 3.1.2 takes a slightly more aggressive approach, and tries to
compare the nanosecond part of the timestamp as well. However, since
--modify-window parameter is still supported, in my understanding this
happens only if the file was originally nominated for a transfer, so
that the selection is still controlled by cmp_time() from util.c ; in
other words, if the destination has the same value of .st_mtime field as
the source, and the same contents etc -- no data transfer would be
attempted.
One could possibly recompile rsync, telling it to ignore all nanosecond
values whatsoever, but this is not exactly what we want here: it is
rather desirable to transfer the existing terabytes of data as
accurately as possible, omitting just a few files whose times were
broken anyway.
At the moment I can not determine what makes these files to appear in
the first place, but the fact is that they exist, and although are very
small in numbers, can be very annoying.
(b) Another option is to allow rsync to ignore setting mtime failure,
and to return RERR_OK irregardless of that.
The downside is that rsync will keep retrying to resend this data over
and over again. The plus side, however, is that setting mtimes is a very
complex task that can go wrong in a number of different ways on a number
of platforms, so it may be desirable to let the user ignore this error
in the case if the bulk of the data was transferred correctly.
In that case, we will probably need another rsync option, something like
--ignore-missing-mtimes . Since choosing a proper name is a whole
another problem, I am adding here a "wrong" patch that does not
introduce extra rsync options, but rather extends
'--ignore-missing-args' functionality to ignore mtime setting errors as
well :
https://gist.github.com/anonymous/e6c95dc424837e09015104b1d5a2a59e
This patch works for both rsync 3.1.1 and 3.1.2, since the addressed
code did not change much between the versions.
I sincerely hope that rsync can address the nanosecond mtimes issue in
one way or another, since apparently it arises from time to time and
when it does, can be very annoying.
Sincerely,
George Fedorov
--
George Fedorov
Senior Systems Specialist
Melbourne School of Engineering
The University of Melbourne, Victoria 3010, Australia
phone: *****
email: *****
http://www.eng.unimelb.edu.au/
--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html