[ 
https://issues.apache.org/jira/browse/PDFBOX-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16781679#comment-16781679
 ] 

Andreas Lehmkühler commented on PDFBOX-4477:
--------------------------------------------

The origin issue was about COSStrings which seemed to be equal because of the 
fact that COSString uses the represented String for calculating the hash von 
the HashSet. Maybe you should limit the newly introduced check to COSString 
objects, as only those seem to affected.

> Large encrypted file takes days to be parsed
> --------------------------------------------
>
>                 Key: PDFBOX-4477
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4477
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Crypto, Parsing
>    Affects Versions: 2.0.14
>            Reporter: Tilman Hausherr
>            Priority: Major
>              Labels: optimization
>             Fix For: 2.0.15, 3.0.0 PDFBox
>
>
> As reported by [~slavago] in TIKA-2832. File is confidential but I have it. 
> Initial findings:
> - File is AES256 encrypted with empty user password
> - File has about 1000 objects
> - File is a tagged PDF
> - HashMap in SecurityHandler grows to 100000?!
> - Using an IdentityHashMap speeds up the process dramatically (parsed in a 
> few seconds), and it may also be a better solution that what was done in 
> PDFBOX-4453
> Todo:
> - Read description of IdentityHashMap again
> - Find out why the HashMap grows so much. Could it be that identical objects 
> are stored twice? Or does the file have many direct objects?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to