pax -s option and symlink targets

Michael Forney via austin-group-l at The Open Group Sat, 03 Jul 2021 18:44:32 -0700

Hello,

I'm working on an implementation of the pax tool, and am looking
for some clarification of how (if at all) the -s option should
interact with symlink targets.


Here are the relevant parts of the specification that I found:

> -s replstr
> Modify file or archive member names named by pattern or file
> operands according to the substitution expression replstr, using
> the syntax of the ed utility

> In read mode, the archive members shall be selected based on the
> user-specified pattern operands as modified by the -c, -n, and
> -u options. Then, any -s and -i options shall modify, in that
> order, the names of the selected files.

It is clear that the substitution should apply to archive member
names, but it doesn't say anything about linkpaths.

Since the linkpath of a hardlink member is the name of another
archive member, it makes sense that it should apply there as well.

On the other hand, symlink targets are relative to the symlink
member name, or they may be absolute and not the name of an archive
member at all, so I don't think it makes sense to match them with
the same pattern intended for full archive member names. However,
BSD pax seems to apply the substitution to all name and linkname
fields, and has this lengthy comment[0] about the issue:

> IMPORTANT: We have a problem. what do we do with symlinks?
> Modifying a hard link name makes sense, as we know the file it
> points at should have been seen already in the archive (and if it
> wasn't seen because of a read error or a bad archive, we lose
> anyway). But there are no such requirements for symlinks. On one
> hand the symlink that refers to a file in the archive will have to
> be modified to so it will still work at its new location in the
> file system. On the other hand a symlink that points elsewhere (and
> should continue to do so) should not be modified. There is clearly
> no perfect solution here. So we handle them like hardlinks. Clearly
> a replacement made by the interactive rename mapping is very likely
> to be correct since it applies to a single file and is an exact
> match. The regular expression replacements are a little harder to
> justify though. We claim that the symlink name is only likely
> to be replaced when it points within the file tree being moved and
> in that case it should be modified. what we really need to do is to
> call an oracle here. :)

In particular, I found that it is quite useful to use -s ',^[^/]*/,,'
to strip the leading directory when extracting archives (similar
to --strip-components=1 in GNU tar). However, if there is a symlink
in some directory in the archive with target ../foo.h, the BSD
behavior causes it to get transformed to foo.h, which is broken.

What's the intended behavior here?

Thanks,

Michael

[0] 
https://github.com/NetBSD/src/blob/15249100f6b3e7d709a0f35a53a5d0c456ae348b/bin/pax/pat_rep.c#L656-L674

pax -s option and symlink targets

Reply via email to