[jira] [Resolved] (PDFBOX-6065) LZWFilter crashes, probably not handling the KwKwK special case

Tilman Hausherr (Jira) Wed, 17 Sep 2025 23:40:29 -0700


     [ 
https://issues.apache.org/jira/browse/PDFBOX-6065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Tilman Hausherr resolved PDFBOX-6065.
-------------------------------------
      Assignee: Tilman Hausherr
    Resolution: Fixed

Thanks!

> LZWFilter crashes, probably not handling the KwKwK special case
> ---------------------------------------------------------------
>
>                 Key: PDFBOX-6065
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-6065
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 2.0.34, 3.0.5 PDFBox
>            Reporter: Daniel Persson
>            Assignee: Tilman Hausherr
>            Priority: Minor
>             Fix For: 2.0.35, 3.0.6 PDFBox, 4.0.0
>
>         Attachments: elvis5.pdf, lzwfilter.patch
>
>
> The parsing throws an exception when trying to parse an image with the words 
> "The Legend" in the PDF.
> java.io.IOException: negative array index: -1 near offset 1
>     at org.apache.pdfbox.filter.LZWFilter.checkIndexBounds(LZWFilter.java:136)
>     at org.apache.pdfbox.filter.LZWFilter.doLZWDecode(LZWFilter.java:110)
>     at org.apache.pdfbox.filter.LZWFilter.decode(LZWFilter.java:70)
>  
> I've not looked into the Lempel-Ziv algorithm since the 90s, so I'm not up to 
> date with all the papers that have been published. And also, I've never read 
> the original welsh paper:
> [https://courses.cs.duke.edu/spring03/cps296.5/papers/welch_1984_technique_for.pdf]
> But it seems that ChatGPT was able to find this paper and suggest a patch by 
> rewriting the function handling all cases, not needing the bounds check at 
> all. Not saying that this is the right solution to the problem, but I ran it 
> against our 50k pages from multiple publishers and newspapers without any 
> visual artifacts, and it also works with the example provided in this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Resolved] (PDFBOX-6065) LZWFilter crashes, probably not handling the KwKwK special case

Reply via email to