[ 
https://issues.apache.org/jira/browse/PDFBOX-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-4477:
------------------------------------
    Description: 
As reported by [~slavago] in TIKA-2832. File is confidential but I have it. 
Initial findings:
- File is AES256 encrypted with empty user password
- File has about 1000 objects
- File is a tagged PDF
- HashMap in SecurityHandler grows to 100000?!
- Using an IdentityHashMap speeds up the process dramatically (parsed in a few 
seconds), and it may also be a better solution that what was done in PDFBOX-4453

Todo:
- Read description of IdentityHashMap again
- Find out why the HashMap grows so much. Could it be that identical objects 
are stored twice? Or does the file have many direct objects?

  was:
As reported by [~slavago] in TIKA-2832. File is confidential but I have it. 
Initial findings:
- File is AES256 encrypted with empty user password
- File has about 1000 objects
- File is a tagged PDF
- HashMap in SecurityHandler grows to 100000?!
- Using an IdentityHashMap speeds up the process dramatically, and it may also 
be a better solution that what was done in PDFBOX-4453

Todo:
- Read description of IdentityHashMap again
- Find out why the HashMap grows so much. Could it be that identical objects 
are stored twice? Or does the file have many direct objects?


> Large encrypted file takes days to be parsed
> --------------------------------------------
>
>                 Key: PDFBOX-4477
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4477
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Crypto, Parsing
>    Affects Versions: 2.0.14
>            Reporter: Tilman Hausherr
>            Priority: Major
>             Fix For: 2.0.15, 3.0.0 PDFBox
>
>
> As reported by [~slavago] in TIKA-2832. File is confidential but I have it. 
> Initial findings:
> - File is AES256 encrypted with empty user password
> - File has about 1000 objects
> - File is a tagged PDF
> - HashMap in SecurityHandler grows to 100000?!
> - Using an IdentityHashMap speeds up the process dramatically (parsed in a 
> few seconds), and it may also be a better solution that what was done in 
> PDFBOX-4453
> Todo:
> - Read description of IdentityHashMap again
> - Find out why the HashMap grows so much. Could it be that identical objects 
> are stored twice? Or does the file have many direct objects?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to