[ https://issues.apache.org/jira/browse/PDFBOX-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16160891#comment-16160891 ]
Hasan Karaoğlu commented on PDFBOX-3926: ---------------------------------------- So, How to we know positions (x, y) of image in a page as progmatically? For example I can images from pdf file by below code. But I dont know how to detect positions {code:java} PDResources pdResources = pdPage.getResources(); int i = 0; for (COSName c : pdResources.getXObjectNames()) { try { PDXObject o = pdResources.getXObject(c); if (o instanceof org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject) { String pageFileName = temporaryHtmlFilePath +"_" + imageId + ".png"; File file = new File(pageFileName); ImageIO.write(((PDImageXObject)o).getImage(), "png", file); String imageBase64Code = Base64Utils.encodeFile(pageFileName); pageDoc.body().child(0).child(0).prepend(getImageHtmlTemplate(imageBase64Code)); FileUtils.delete(pageFileName); imageId++; } } catch (IOException ex) { Logger.getLogger(PdfToHtmlConverter.class.getName()).log(Level.SEVERE, null, ex); } } {code} > ExtractImages > -------------- > > Key: PDFBOX-3926 > URL: https://issues.apache.org/jira/browse/PDFBOX-3926 > Project: PDFBox > Issue Type: Improvement > Reporter: Hasan Karaoğlu > > Hi, I extract texts from pdf by below command. But it doesnt extract images. > And So, I use extract images command. But how can we merge these two data > sequentially? > Extract Texts: (First command) > {code:java} > java -jar pdfbox.jar ExtractText -html {{inputFileName}} -startPage > {{startPage}} -endPage {{endPage}} -encoding UTF-8 {{outputFileName}} > {code} > Extract Images: (Second command) > {code:java} > java -jar pdfbox-app.jar ExtractImages [OPTIONS] <inputfile> > {code} > For example I run first command and I have a output.html file. But this file > has just text parts of page. There is no image. And I run second command , I > get image as file. Then, How can I merge these two seperated files. Order of > elements in page is important. > -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org