[
https://issues.apache.org/jira/browse/PDFBOX-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17323809#comment-17323809
]
Tim Allison commented on PDFBOX-5166:
-------------------------------------
Completely unsurprisingly, [~tilman] has already shown how to extract these
files on SO:
https://stackoverflow.com/questions/45460027/what-is-the-best-way-to-extract-embedded-flash-file-from-a-pdf-using-the-pdfbox
If this is a "not going to fix", no problem! I'm happy to put that code into
Tika for now, and if a RichMedia annotation gets implemented in PDFBox, I can
update our code accordingly.
> Implement RichMedia annotation
> ------------------------------
>
> Key: PDFBOX-5166
> URL: https://issues.apache.org/jira/browse/PDFBOX-5166
> Project: PDFBox
> Issue Type: Task
> Reporter: Tim Allison
> Priority: Major
> Attachments: testFlashInPDF.pdf
>
>
> See TIKA-3359. The attached file as an embedded Flash/swf file. Tika is not
> currently extracting the embedded file.
> In the debugger, I can see the Annotation as a PDAnnotationUnknown. In the
> COSDictionary, I can see the subtype is "RichMedia". If someone has the
> time, it'd be great to implement this so that we can extract more attachments
> in Tika... Obv, others may find use too. :D
> Many thanks to Tyler Thorsted for the test file and many thanks to
> @terminalboredom and @beet_keeper.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]