[ 
https://issues.apache.org/jira/browse/COMPRESS-727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18092693#comment-18092693
 ] 

Tomas Illuminati commented on COMPRESS-727:
-------------------------------------------

Thanks @Marcono1234, that PoC is convincing and you're right. The .. after a 
symlink component is resolved by the OS relative to where the link points, not 
lexically the way normalize() does it, and since the escaping entry can be 
created first pointing at a target that does not exist yet, a real-path check 
at creation can't catch it either. So ALLOW_WITHIN_ROOT cannot fully guarantee 
that the created links stay within the root once they are followed in 
combination.

@garydgregory @ppkarwasz if you agree with documenting it this way 
(best-effort, with REJECT/SKIP as the safe modes), I'll proceed.

> Provide a symlink-resistant (and concurrency-safe) archive Extractor
> --------------------------------------------------------------------
>
>                 Key: COMPRESS-727
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-727
>             Project: Commons Compress
>          Issue Type: Improvement
>            Reporter: Piotr Karwasz
>            Priority: Major
>         Attachments: COMPRESS-727.patch
>
>
> h2. Summary
> {{ArchiveEntry.resolveIn(Path)}} protects against classic Zip-slip by 
> resolving the entry name against the target directory, normalizing it, and 
> rejecting the result if it no longer {{startsWith}} the target. This check is 
> purely *lexical*: it manipulates path strings only. It does not look at the 
> file system, so it does not defend against *symlink* attacks, and the example 
> {{Expander}} that drives extraction does not create symlinks at all (so it is 
> both lossy for legitimate archives and silent about the risk).
> This issue proposes a first-class, security-first {{Extractor}} that safely 
> materializes a whole archive into a directory, including symlinks, and 
> resists both symlink-slip from the archive itself and concurrent symlink 
> races from third parties. The API is centered on 
> {{Extractor.newExtractor(Path)}}, which returns a {{SecureExtractor}} when 
> the platform supports {{java.nio.file.SecureDirectoryStream}}, or a 
> best-effort {{Extractor}} otherwise.
> h2. Background: what resolveIn does and does not cover
> {code:java}
> default Path resolveIn(final Path parentPath) throws IOException {
>     final Path outputFile = parentPath.resolve(getName()).normalize();
>     if (!outputFile.startsWith(parentPath)) {
>         throw new ArchiveException("Zip slip ...");
>     }
>     return outputFile;
> }
> {code}
> This stops {{../../etc/passwd}} style names. It does *not* stop:
> * A symlink *entry inside the archive* (for example {{link -> /etc}} or 
> {{link -> ../..}}) followed by a later entry written *through* that link 
> ({{link/evil}}). The lexical check on {{target/link/evil}} passes because the 
> string starts with {{target}}, but {{link}} is a symlink and the write 
> escapes. This is "symlink-slip", and it defeats the lexical guard even when 
> the target directory is fully trusted.
> * A symlink (or directory-to-symlink swap) planted *concurrently* by another 
> process between the {{resolveIn}} check and the actual 
> {{Files.newOutputStream}} call. The check and the write are not atomic with 
> respect to the intermediate path components, so this is a classic TOCTOU race.
> The metadata needed to handle links is already exposed:
> * {{TarArchiveEntry}}: {{isSymbolicLink()}}, {{isLink()}} (hard link), 
> {{getLinkName()}}, plus {{isBlockDevice()}} / {{isCharacterDevice()}} / 
> {{isFIFO()}} for special types.
> * {{ZipArchiveEntry}}: {{isUnixSymlink()}}, with 
> {{ZipFile.getUnixSymlink(entry)}} returning the link target.
> There is currently no use of {{SecureDirectoryStream}}, 
> {{LinkOption.NOFOLLOW_LINKS}}, or {{Files.createSymbolicLink}} anywhere in 
> main code, so this is greenfield.
> h2. Threat model
> # *Zip-slip* (path traversal via {{..}} or absolute names). Already handled 
> by {{resolveIn}}.
> # *Symlink-slip from the archive* (trusted target directory). The archive 
> plants a symlink and then writes through it, or supplies a symlink whose 
> target points outside the extraction root. Must be handled by an extractor 
> that controls symlink creation and never follows a link when materializing a 
> child path.
> # *Concurrent symlink race* (untrusted / shared target directory). A third 
> party with write access to the target tree swaps a path component for a 
> symlink during extraction. Defeating this requires file-descriptor-relative, 
> one-component-at-a-time resolution with no-follow semantics, that is, 
> {{SecureDirectoryStream}}.
> h2. Real-life examples of untrusted / shared target directories
> These are settings where threat 3 is realistic because another user or 
> process can write into the extraction tree while extraction runs:
> * *Shared CI / build agents*: multiple jobs extract into a common workspace 
> or temp directory on the same host; one job races another.
> * *Multi-user package-manager caches*: extracting into group- or 
> world-writable shared caches (Maven {{~/.m2}}, Gradle, pip, npm) on a shared 
> build box.
> * *Web upload processing*: an upload is extracted into a directory that other 
> worker processes or requests (possibly other tenants) can also write.
> * *Container image / layer extraction*: a layer's symlink redirects a later 
> layer's write. Container runtimes specifically defend this (for example 
> Docker/containerd use chroot-style joins, and Go's {{filepath-securejoin}} 
> exists for exactly this).
> * *Background daemons*: antivirus, indexers, or backup agents racing the 
> extractor over the same tree.
> Prior art that hardened against the same classes of bug:
> * The original Zip-slip disclosure (2018) prompted the lexical fix that 
> {{resolveIn}} implements.
> * Python's {{tarfile}} extraction filters (PEP 706, shipped in 3.12, 
> hardening the long-standing CVE-2007-4559) added {{data}} / {{tar}} / 
> {{fully_trusted}} filters that, among other things, reject symlinks and links 
> escaping the destination.
> * GNU tar performs "secure" extraction by removing an existing symlink before 
> writing through its name and by deferring symlink creation.
> * Container runtimes rely on {{filepath-securejoin}} (Go) to resolve each 
> component safely against symlink races.
> h2. Proposed API
> A supported {{Extractor}} type (promoted out of {{archivers.examples}} into, 
> for example, {{org.apache.commons.compress.archivers}}), with a 
> capability-detecting factory:
> {code:java}
> // Picks the strongest implementation the platform/filesystem allows.
> public static Extractor newExtractor(Path targetDirectory) throws IOException;
> {code}
> * On platforms where {{Files.newDirectoryStream(targetDirectory) instanceof 
> SecureDirectoryStream}} (Unix-like systems, see 
> {{sun.nio.fs.UnixSecureDirectoryStream}}), {{newExtractor}} returns a 
> *{{SecureExtractor}}* that is resistant to both symlink-slip and concurrent 
> symlink races.
> * Otherwise (notably Windows, or filesystems whose provider does not return a 
> secure stream) it returns a base {{Extractor}} that defends symlink-slip from 
> the archive on a best-effort basis but documents that it cannot fully close 
> the TOCTOU race.
> The {{Extractor}} mirrors the entry points of today's {{Expander}} 
> ({{ArchiveInputStream}}, {{ZipFile}}, {{TarFile}}, {{SevenZFile}}) and adds 
> configuration:
> {code:java}
> Extractor extractor = Extractor.newExtractor(targetDir)
>         .setSymlinkPolicy(SymlinkPolicy.ALLOW_WITHIN_ROOT) // default REJECT
>         .setSpecialFilePolicy(SpecialFilePolicy.SKIP)      // devices, FIFOs
>         .setOverwrite(false);
> extractor.extract(archiveInputStream);
> {code}
> h3. SecureExtractor design
> Walk each entry's parent path *component by component* starting from an open 
> directory handle on the (validated) extraction root:
> * For each intermediate component, open the subdirectory via 
> {{SecureDirectoryStream.newDirectoryStream(name, 
> LinkOption.NOFOLLOW_LINKS)}}. If a component is a symlink, refuse (it cannot 
> be a legitimate parent under no-follow).
> * Create files relative to the innermost directory handle with no-follow / 
> create-new semantics ({{newByteChannel}} with {{NOFOLLOW_LINKS}} + 
> {{CREATE_NEW}}), so a concurrently planted symlink cannot redirect the write.
> * Create directories and symlinks relative to the directory handle, never via 
> an absolute resolved path.
> Because every step is relative to a file descriptor rather than a re-resolved 
> string path, a third party cannot win the race by swapping a parent component.
> h3. Symlink policy
> ||Policy||Behavior||
> |{{REJECT}} (default)|Fail if the archive contains any symlink entry.|
> |{{SKIP}}|Silently ignore symlink entries.|
> |{{ALLOW_WITHIN_ROOT}}|Create a symlink only if its target resolves to a 
> location inside the extraction root; reject links escaping the root. Never 
> follow the new link when writing later entries.|
> |{{ALLOW_ALL}}|Create symlinks verbatim (only for fully trusted archives).|
> h3. Entry-type handling
> ||Entry type||Default action||
> |Regular file|Extract (no-follow, create-new unless overwrite enabled).|
> |Directory|Create (no-follow).|
> |Symbolic link|Per {{SymlinkPolicy}} (default {{REJECT}}).|
> |Hard link (tar {{isLink()}})|Resolve link target within root, no-follow; 
> reject if it escapes.|
> |Device / FIFO / other special|Per {{SpecialFilePolicy}} (default {{SKIP}}; 
> never created by the secure path).|
> h2. Platform support and degradation
> * {{SecureDirectoryStream}} is available on Unix-like platforms via the 
> default file system provider; {{newExtractor}} detects it with an 
> {{instanceof}} check on a directory stream opened over the target.
> * On platforms without it, the base {{Extractor}} still validates each parent 
> component with {{LinkOption.NOFOLLOW_LINKS}} (via {{Files.readAttributes}}) 
> and refuses to traverse existing symlinks, which closes symlink-slip from the 
> archive but remains non-atomic against a concurrent attacker. This residual 
> limitation must be documented clearly.
> h2. Acceptance criteria
> * New {{Extractor}} (supported, not under {{examples}}) with 
> {{newExtractor(Path)}} returning {{SecureExtractor}} when 
> {{SecureDirectoryStream}} is available, else a best-effort {{Extractor}}.
> * Configurable {{SymlinkPolicy}} (default {{REJECT}}) and special-file 
> handling; safe hard-link resolution.
> * {{SecureExtractor}} performs descriptor-relative, component-by-component, 
> no-follow resolution and create-new writes.
> * Entry points covering {{ArchiveInputStream}}, {{ZipFile}}, {{TarFile}}, and 
> {{SevenZFile}}, matching the existing {{Expander}} surface.
> * Tests: symlink-slip archive (link then write-through) is blocked under a 
> trusted target; an escaping symlink target is rejected; a concurrent 
> directory-to-symlink swap is defeated by {{SecureExtractor}} (and documented 
> as not fully defeated by the base extractor); legitimate within-root symlinks 
> extract correctly when allowed.
> * Javadoc plus a "safe extraction" section in the user guide, 
> cross-referencing {{ArchiveEntry.resolveIn}} and the platform caveats.
> * Consider deprecating / redirecting {{Expander}} toward {{Extractor}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to