michaelselehov wrote:
## Background
A compressed offload bundle (CCOB) stores one or more code objects in a
zstd/zlib-compressed payload behind a small header. Several places in the tree
read these bundles back, and they all support **multiple concatenated bundles**
in one buffer (e.g. when a linker merges `.hip_fatbin` input sections from
several translation units into one section). To walk that sequence, each reader
needs to know where one bundle ends and the next begins.
## The bug
All three readers locate that boundary by scanning the buffer for the literal
4-byte magic `"CCOB"`:
```cpp
// llvm/lib/Object/OffloadBundle.cpp (extractOffloadBundle)
// clang/lib/Driver/OffloadBundler.cpp (ListBundleIDsInFile, UnbundleFiles)
NextBundleStart = Buffer.find("CCOB", 4);
...
MemoryBuffer::getMemBuffer(Buffer.take_front(NextBundleStart), ...); // ->
decompress
```
The magic is `0x43 0x43 0x4F 0x42`. Nothing prevents those four bytes from
occurring **by chance inside a compressed payload**. When they do, the scan
stops in the middle of the current bundle, `take_front` truncates it, and the
truncated buffer is handed to the decompressor. zstd then sees a source size
that is smaller than the frame it is told to decode and bails out:
```
Failed to decompress input: Could not decompress embedded file contents: Src
size is incorrect
```
At runtime this path is reached through comgr's `AMD_COMGR_ACTION_UNBUNDLE`, so
the user-visible symptom is `hipErrorInvalidImage` — *"device kernel image is
invalid"* — when loading the affected code object, followed by *"named symbol
not found"* for the kernels it should have contained.
A few properties that made this confusing to triage:
* **The bundle is not corrupt.** Decompressing the full payload with the system
`zstd` (or the in-tree decompressor) succeeds and yields the expected bytes.
Only the *outer splitter* truncates the buffer; `decompress()` itself already
uses the header's `FileSize` to size the frame.
* **It is data-dependent.** Only bundles whose compressed bytes happen to
contain `"CCOB"` are affected. Two objects built from the same sources with
slightly different contents can differ in whether they trip the bug, so it
looks intermittent and "object-specific" and has no single causing commit.
* It became reachable once compressed bundles became the common case; before
that the scan rarely had enough compressed data to collide with the magic.
## The fix
Stop using the magic to find where a bundle *ends*. The compressed header
already records the bundle's on-disk size (`FileSize`, present for header
versions V2/V3 via `CompressedBundleHeader::tryParse`), which is authoritative.
Use it to delimit the current bundle, and only search for the next bundle's
`"CCOB"` magic *past* the end of the current one:
```cpp
if (std::optional<size_t> Size = getCompressedBundleSize(Buffer)) {
CurBundleEnd = *Size; // exact end of this bundle
NextBundleStart = Buffer.find("CCOB", *Size); // next bundle, if any
} else {
// Legacy V1 header without a size field: keep the previous behaviour.
NextBundleStart = Buffer.find("CCOB", 4);
CurBundleEnd = NextBundleStart;
}
```
This keeps the multi-bundle iteration intact — searching from `*Size` rather
than from offset 4 also tolerates any alignment padding between concatenated
bundles — while a magic-byte collision inside a payload is no longer able to
truncate anything. The change is applied identically in all three readers:
* `clang/lib/Driver/OffloadBundler.cpp` — `ListBundleIDsInFile`
* `clang/lib/Driver/OffloadBundler.cpp` — `UnbundleFiles`
* `llvm/lib/Object/OffloadBundle.cpp` — `extractOffloadBundle`
## Testing
`clang/test/Driver/clang-offload-bundler-magic-collision.c` is a regression test
for exactly this collision. Manufacturing a payload whose *compressed* bytes
contain `"CCOB"` by tuning the input would be brittle, so the input is built
deterministically instead: a real compressed bundle gets a **zstd skippable
frame** spliced into its compressed region, and that frame's body is the bytes
`"CCOB"`. A skippable frame is defined by the zstd format to be ignored by the
decompressor, so the bundle still decompresses correctly — but the old
`find("CCOB", 4)` scan stops inside it. The committed `.co` and the small Python
generator used to produce it both live under `Inputs/`.
* Without the fix, `--list`/`--unbundle` on this input fail with
`Src size is incorrect`.
* With the fix, both succeed and enumerate/extract the targets.
The existing `clang/test/Driver/clang-offload-bundler-multi-compress.c`
(multiple concatenated CCOB blobs) continues to pass, confirming the
multi-bundle path is unaffected.
---
*Assisted-by: Claude Opus (Cursor).*
https://github.com/llvm/llvm-project/pull/205587
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits