Hello, I'm working on an implementation of the pax tool, and am looking for some clarification of how (if at all) the -s option should interact with symlink targets.
Here are the relevant parts of the specification that I found: > -s replstr > Modify file or archive member names named by pattern or file > operands according to the substitution expression replstr, using > the syntax of the ed utility > In read mode, the archive members shall be selected based on the > user-specified pattern operands as modified by the -c, -n, and > -u options. Then, any -s and -i options shall modify, in that > order, the names of the selected files. It is clear that the substitution should apply to archive member names, but it doesn't say anything about linkpaths. Since the linkpath of a hardlink member is the name of another archive member, it makes sense that it should apply there as well. On the other hand, symlink targets are relative to the symlink member name, or they may be absolute and not the name of an archive member at all, so I don't think it makes sense to match them with the same pattern intended for full archive member names. However, BSD pax seems to apply the substitution to all name and linkname fields, and has this lengthy comment[0] about the issue: > IMPORTANT: We have a problem. what do we do with symlinks? > Modifying a hard link name makes sense, as we know the file it > points at should have been seen already in the archive (and if it > wasn't seen because of a read error or a bad archive, we lose > anyway). But there are no such requirements for symlinks. On one > hand the symlink that refers to a file in the archive will have to > be modified to so it will still work at its new location in the > file system. On the other hand a symlink that points elsewhere (and > should continue to do so) should not be modified. There is clearly > no perfect solution here. So we handle them like hardlinks. Clearly > a replacement made by the interactive rename mapping is very likely > to be correct since it applies to a single file and is an exact > match. The regular expression replacements are a little harder to > justify though. We claim that the symlink name is only likely > to be replaced when it points within the file tree being moved and > in that case it should be modified. what we really need to do is to > call an oracle here. :) In particular, I found that it is quite useful to use -s ',^[^/]*/,,' to strip the leading directory when extracting archives (similar to --strip-components=1 in GNU tar). However, if there is a symlink in some directory in the archive with target ../foo.h, the BSD behavior causes it to get transformed to foo.h, which is broken. What's the intended behavior here? Thanks, Michael [0] https://github.com/NetBSD/src/blob/15249100f6b3e7d709a0f35a53a5d0c456ae348b/bin/pax/pat_rep.c#L656-L674