Dear Andreas,
thanks a lot for the explanation! Using lustre_rsync is only a temporal
solution for us. We created a full copy of a lustre system in a
different location using ZFS snapshots. This system has then been
renamed and the idea was to keep it for a limited time synchronized with
the original system. Afterwards the new system should be used in
production.
With your explanation and a look into the source code, I had now a
closer look to the created statuslog. The entries are almost exclusively
pointing to temporal files. Files that are created and deleted or moved
quickly afterwards. We use the statuslog now to identify directories
that need synchronization using standard rsync. That works as long as
the parent of the element to synchronize is still available, for other
cases periodical traditional rsync will be necessary.
Cheers,
Robert
Am 21.05.22 um 01:32 schrieb Andreas Dilger:
On May 20, 2022, at 06:33, Robert Redl <[email protected]> wrote:
Dear Lustre Experts,
since a few weeks we are keeping two Lustre system synchronous using
lustre_rsync. That works fine, but the statuslog file is growing. It
is currently about 500MB in size. Updating it is apparently slowing
down the whole process.
Is it only important to keep the statuslog in cases where
lustre_rsync has been interrupted? Or is it necessary to keep it
forever in order to not miss any changes.
It should be noted that lustre_rsync is not commonly used and only
tested in the context of an automated regression test that runs and
largely passes. It was developed originally as a proof of concept for
Lustre Changelogs, so may be missing support for newer features (e.g.
explicit file layouts, project IDs, ACLs, etc, though that *may* all
be handled by rsync). There may be unknown bugs lurking in this code,
so use with some caution (i.e. don't sync your bank transaction
records with it).
I would recommend to at least use some other tool (e.g. MPIFileUtils)
to periodically do a full scan to verify that the files are being
copied over properly to the target filesystem.
Taking a quick look into the lustre_rsync.c, I see that "statuslog"
appears to be a log file of pending rename actions, or something like
that? It is backed up when lustre_rsync is started, but only to a
file <statuslog>.old. It looks like an entry is added into "parents"
if it is renamed to/from a directory that doesn't exist in the target,
but I don't know enough detail to say why that isn't working properly
in your case.
Feedback, patches, and status updates are welcome. Maybe you can
present about your usage of it at LAD this year? I don't want to
totally discourage your usage of lustre_rsync since it has potential
for further improvements (e.g. parallel copying, bug fixing, etc),
otherwise it will never get better, but just wanted to make sure you
know what the current state of this tool.
Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Whamcloud
_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org