[ 
https://issues.apache.org/jira/browse/PDFBOX-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17323809#comment-17323809
 ] 

Tim Allison commented on PDFBOX-5166:
-------------------------------------

Completely unsurprisingly, [~tilman] has already shown how to extract these 
files on SO: 
https://stackoverflow.com/questions/45460027/what-is-the-best-way-to-extract-embedded-flash-file-from-a-pdf-using-the-pdfbox

If this is a "not going to fix", no problem!  I'm happy to put that code into 
Tika for now, and if a RichMedia annotation gets implemented in PDFBox, I can 
update our code accordingly.

> Implement RichMedia annotation
> ------------------------------
>
>                 Key: PDFBOX-5166
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5166
>             Project: PDFBox
>          Issue Type: Task
>            Reporter: Tim Allison
>            Priority: Major
>         Attachments: testFlashInPDF.pdf
>
>
> See TIKA-3359.  The attached file as an embedded Flash/swf file.  Tika is not 
> currently extracting the embedded file.
> In the debugger, I can see the Annotation as a PDAnnotationUnknown.  In the 
> COSDictionary, I can see the subtype is "RichMedia".  If someone has the 
> time, it'd be great to implement this so that we can extract more attachments 
> in Tika...  Obv, others may find use too. :D
> Many thanks to Tyler Thorsted for the test file and many thanks to 
> @terminalboredom and @beet_keeper.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to