Hi list,

As I understand it, CephFS implements hard links as effectively "smart
soft links", where one link is the primary for the inode and the others
effectively reference it. When it comes to directories, the size for a
hardlinked file is only accounted for in recursive stats for the
"primary" link. This is good (no double-accounting).

I'd like to be able to control *which* of those hard links is the
primary, post-facto, to control what directory their size is accounted
under. I want to write a tool that takes some rules as to which
directories should be "preferred" for containing the master link, and
corrects it if necessary (by recursively stating everything and looking
for files with the same inode number to enumerate all links).

To swap out a primary link with another I came up with this sequence:

link("old_primary", "tmp1")
symlink("tmp1", "tmp2")
rename("tmp2", "old_primary") // old_primary replaced with another inode
stat("/otherdir/new_primary") // new_primary hopefully takes over stray
rename("tmp1", "old_primary)  // put things back the way they were

The idea is that, since renames of hardlinks over themselves are a no-op
in POSIX and won't work, I need to use an intermediate symlink step to
ensure continuity of access to the old file; this isn't 100% transparent
but it beats e.g. removing old_primary and re-linking new_primary over
it (which would cause old_primary to vanish for a short time, which is
undesirable). Hopefully the stat() ensures that the new_primary is what
takes over the stray inode. This seems to work in practice; if there is
a better way, I'd like to hear it.

Figuring out which link is the primary is a bigger issue. Only
directories report recursive stats where this matters, not files
themselves. On a directory with hardlinked files, if ceph.dir.rfiles >
sum(ceph.dir.rfiles for each subdir) + count(files with nlinks == 1)
then some hardlinked files are primary; I could attempt to use this
formula and then just do the above dance for every hardlinked file to
move the primaries off, but this seems fragile and likely to break in
certain situations (or do needless work). Any other ideas?

Hector Martin (hec...@marcansoft.com)
Public Key: https://mrcn.st/pub
ceph-users mailing list

Reply via email to