[
https://issues.apache.org/jira/browse/PDFBOX-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13735909#comment-13735909
]
Florent Guillaume commented on PDFBOX-1622:
-------------------------------------------
Thanks Andreas.
> TextNormalize init not thread-safe, may lead to infinite loop
> -------------------------------------------------------------
>
> Key: PDFBOX-1622
> URL: https://issues.apache.org/jira/browse/PDFBOX-1622
> Project: PDFBox
> Issue Type: Bug
> Components: Utilities
> Affects Versions: 1.0.0
> Reporter: Florent Guillaume
> Assignee: Andreas Lehmkühler
> Fix For: 1.8.3, 2.0.0
>
> Attachments: PDFBOX-1622.patch.txt
>
>
> TextNormalize fills a static HashMap (DIACHASH) from a method
> (populateDiacHash) called by the TextNormalize constructor.
> If the constructor is called from two different threads at the same time,
> then the HashMap may be written by two concurrent threads which may and will
> cause infinite loops.
> We see the CPU at 100% and jstack shows 4 threads all stuck at:
> "Thread-2" prio=10 tid=0x00007f6e94499000 nid=0x347 runnable
> [0x00007f6e925d6000]
> java.lang.Thread.State: RUNNABLE
> at java.util.HashMap.put(HashMap.java:391)
> at
> org.apache.pdfbox.util.TextNormalize.populateDiacHash(TextNormalize.java:82)
> at org.apache.pdfbox.util.TextNormalize.<init>(TextNormalize.java:41)
> at
> org.apache.pdfbox.util.PDFTextStripper.<init>(PDFTextStripper.java:193)
> A patch to fix this is attached, it just moves the initialization to a static
> block.
> Please apply to the 1.8.3 and 2.0.0 branches.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira