[ 
https://issues.apache.org/jira/browse/TIKA-4012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17711101#comment-17711101
 ] 

Tim Allison edited comment on TIKA-4012 at 4/12/23 2:05 PM:
------------------------------------------------------------

According to the spec, these are some of the places where Associated Files 
(/AF) may appear in 2.x:
{noformat}
• the PDF document catalog dictionary (14.13.3, "Associated files linked to the 
PDF document’s
catalog")
• a page dictionary (14.13.4, "Associated files linked to a page dictionary")
• a graphics object (using a marked-content property list dictionary, 14.13.5, 
"Associated files
linked to graphics objects")
• a structure element dictionary (14.13.6, "Associated files linked to 
structure elements")
• an XObject dictionary (14.13.7, "Associated files linked to XObjects")
• a DParts dictionary (14.13.8, "Associated files linked to DParts")
• an annotation dictionary (14.13.9, "Associated files linked to an annotation 
dictionary")
• a metadata stream dictionary (14.3.2, "Metadata streams")
{noformat}

Oh, I had forgotten about this: 
https://www.pdfa.org/wp-content/uploads/2018/10/PDF20_AN002-AF.pdf  

This suggests that /AF can appear anywhere ... not just in the above in PDF 2.0.


was (Author: [email protected]):
According to the spec, these are some of the places where Associated Files 
(/AF) may appear in 2.x:
{noformat}
• the PDF document catalog dictionary (14.13.3, "Associated files linked to the 
PDF document’s
catalog")
• a page dictionary (14.13.4, "Associated files linked to a page dictionary")
• a graphics object (using a marked-content property list dictionary, 14.13.5, 
"Associated files
linked to graphics objects")
• a structure element dictionary (14.13.6, "Associated files linked to 
structure elements")
• an XObject dictionary (14.13.7, "Associated files linked to XObjects")
• a DParts dictionary (14.13.8, "Associated files linked to DParts")
• an annotation dictionary (14.13.9, "Associated files linked to an annotation 
dictionary")
• a metadata stream dictionary (14.3.2, "Metadata streams")
{noformat}

> Improve extraction of embedded documents in PDFs
> ------------------------------------------------
>
>                 Key: TIKA-4012
>                 URL: https://issues.apache.org/jira/browse/TIKA-4012
>             Project: Tika
>          Issue Type: New Feature
>            Reporter: Tim Allison
>            Priority: Major
>         Attachments: pdfbox-new-attachments-reports.tgz
>
>
> We're currently processing the EmbeddedFiles entry in the name tree and 
> annotations to look for file spec dictionaries. Unfortunately, PDFs may embed 
> files in lots of other places.  The newly free 2.0 spec makes this abundantly 
> and painfully clear. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to