[
https://issues.apache.org/jira/browse/TIKA-1715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14701203#comment-14701203
]
Damiano commented on TIKA-1715:
-------------------------------
then I would add that the code is extremely slow. The file is only 400kB. It
keeps around 22 seconds!
> Save embedded images into another location
> ------------------------------------------
>
> Key: TIKA-1715
> URL: https://issues.apache.org/jira/browse/TIKA-1715
> Project: Tika
> Issue Type: Test
> Components: metadata
> Affects Versions: 1.10
> Reporter: Damiano
> Labels: newbie
>
> Hello,
> I am having a strange problem deadling with embedded images.
> This is my code:
> {code:xml}
> public void getImages() throws IOException, TikaException, SAXException {
>
> try (InputStream stream = new FileInputStream(this.fileName)) {
> RecursiveParserWrapper p = new RecursiveParserWrapper(
> new AutoDetectParser(),
> new
> BasicContentHandlerFactory(BasicContentHandlerFactory.HANDLER_TYPE.IGNORE, -1)
> );
>
> ParseContext context = new ParseContext();
> PDFParserConfig config = new PDFParserConfig();
> config.setExtractInlineImages(true);
> config.setExtractUniqueInlineImagesOnly(true);
> context.set(org.apache.tika.parser.pdf.PDFParserConfig.class,
> config);
> context.set(org.apache.tika.parser.Parser.class, p);
>
> p.parse(stream, new BodyContentHandler(-1), new Metadata(),
> context);
>
> List<Metadata> metadatas = p.getMetadata();
>
> FileInputStream f = new FileInputStream("/tmp/" +
> metadatas.get(1).get("File Name"));
> //FileInputStream f = new
> FileInputStream(metadatas.get(1).get("File Name"));
>
> System.out.println(f.available());
> }
> }
> {code}
> I can get the name of the embedded images with get("File Name") but the path
> seems invalid.
> I need to save all the embedded images (inline images) to another location.
> Thank you in advance!
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)