[ 
https://issues.apache.org/jira/browse/TIKA-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18014098#comment-18014098
 ] 

Tim Allison commented on TIKA-4465:
-----------------------------------

Thank you, Tilman. Great catch on "OnInstantiate". It also looks like there's 
StateEvent for 3D content during "Saving state data" and "Loading State data"? 
I can't tell if that's a pointer to an entry in the name tree's javascript 
entries or if that's actual javascript like "OnInstantiate". Let's put that in 
another ticket later if anyone wants it?

For the requirement handler dictionary, it looks like there's an optional 
{{Script}} value that contains the name of the script (12.11.5 Table 276). That 
name points to an entry in the name tree's javascript entries...so I don't 
think the javascript is stored there, but I may be misreading the spec.

Separately, it turns out that two of our existing unit test files contain 
javascript in the name trees: {{{}testPDFPackage.pdf{}}}, 
{{{}testPDF_XFA_govdocs1_258578.pdf{}}}. So we won't need to find unit test 
files.

> Extract javascript from name dictionary in PDFs
> -----------------------------------------------
>
>                 Key: TIKA-4465
>                 URL: https://issues.apache.org/jira/browse/TIKA-4465
>             Project: Tika
>          Issue Type: Task
>            Reporter: Tim Allison
>            Priority: Minor
>
> This blog 
> [https://labs.senhasegura.blog/unmasking-the-threat-a-deep-dive-into-the-pdf-malicious-2/]
>  mentions this malware file (be careful! dangerous!): 
> [https://bazaar.abuse.ch/download/4dc9b0c20ea61d91d6a1b5bdce76fb5365de0762efb8f6c2925113c6a8950cae/]
>  
>  
> We're currently extracting javascript from actions, but not from the name 
> tree (document level-javascript).
>  
> We should add this extraction if "extractActions" is set to true... or 
> better, come up with a better name for that variable in trunk.
>  
> Related to this, I'd also like to extract javascript in TikaCLI by default as 
> we do for extracting inline images and incremental updates.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to