[ 
https://issues.apache.org/jira/browse/PDFBOX-5681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17764000#comment-17764000
 ] 

Axel Howind commented on PDFBOX-5681:
-------------------------------------

You are right. It would fix the crash, but the result might be incorrect.

I just came up with this, I did a mvn verify and it looks good. I did not check 
whether/how it affects performance. What do you think?

{code:java}
    public List<COSObject> getObjectsByType(COSName type1, COSName type2)
    {
        List<COSObject> retval = new ArrayList<>();
        Set<COSObjectKey> processedKeys = new HashSet<>();
        Set<COSObjectKey> remainingKeys = Set.copyOf(xrefTable.keySet());
        do {
            for (COSObjectKey objectKey : remainingKeys)
            {
                COSObject objectFromPool = getObjectFromPool(objectKey);
                COSBase realObject = objectFromPool.getObject();
                if( realObject instanceof COSDictionary )
                {
                    COSName dictType = ((COSDictionary) 
realObject).getCOSName(COSName.TYPE);
                    if (type1.equals(dictType) || (type2 != null && 
type2.equals(dictType)))
                    {
                        retval.add(objectFromPool);
                    }
                }
            }
            processedKeys.addAll(remainingKeys);
            remainingKeys=new HashSet<>(xrefTable.keySet());
            remainingKeys.removeAll(processedKeys);
        } while (!remainingKeys.isEmpty());
        return retval;
    }
{code}


> ConcurrentModificationException in getObjectsByType() in 3.x
> ------------------------------------------------------------
>
>                 Key: PDFBOX-5681
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5681
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 3.0.0 PDFBox
>            Reporter: Tim Allison
>            Priority: Minor
>         Attachments: PDFBOX-3714-2.pdf
>
>
> [~tilman]'s regression testing turned up this exception when we integrate 
> PDFBox 3.0.0 into Tika:
> {noformat}
> java.util.ConcurrentModificationException
>       at java.base/java.util.HashMap$HashIterator.nextNode(HashMap.java:1597)
>       at java.base/java.util.HashMap$KeyIterator.next(HashMap.java:1620)
>       at 
> org.apache.pdfbox.cos.COSDocument.getObjectsByType(COSDocument.java:254)
>       at 
> org.apache.pdfbox.cos.COSDocument.getObjectsByType(COSDocument.java:240)
> {noformat}
> I can replicate this exception consistently on the attached file.
> With this code:
> {noformat}
>         Path path = Paths.get("/.../PDFBOX-3714-2.pdf");
>         PDDocument document = Loader.loadPDF(path.toFile());
>         List<COSObject> objs = 
> document.getDocument().getObjectsByType(COSName.FILESPEC);
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to