[jira] [Commented] (PDFBOX-3926) ExtractImages

JIRA Mon, 11 Sep 2017 01:08:26 -0700

    [ 
https://issues.apache.org/jira/browse/PDFBOX-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16160891#comment-16160891
 ]


Hasan Karaoğlu commented on PDFBOX-3926:
----------------------------------------

So, How to we know positions (x, y) of image in a page as progmatically?

For example I can images from pdf file by below code. But I dont know how to 
detect positions

{code:java}


  PDResources pdResources = pdPage.getResources();

        int i = 0;
        for (COSName c : pdResources.getXObjectNames()) {
            try {
                PDXObject o = pdResources.getXObject(c);
                
                if (o instanceof 
org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject) { 
                    String pageFileName = temporaryHtmlFilePath +"_" + imageId 
+ ".png";
                    File file = new File(pageFileName);
                    ImageIO.write(((PDImageXObject)o).getImage(), "png", file);
                    String imageBase64Code = 
Base64Utils.encodeFile(pageFileName);
                    
pageDoc.body().child(0).child(0).prepend(getImageHtmlTemplate(imageBase64Code));
                    FileUtils.delete(pageFileName);
                    imageId++;
                }
            } catch (IOException ex) {
                
Logger.getLogger(PdfToHtmlConverter.class.getName()).log(Level.SEVERE, null, 
ex);
            }
           
        }
{code}


> ExtractImages 
> --------------
>
>                 Key: PDFBOX-3926
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3926
>             Project: PDFBox
>          Issue Type: Improvement
>            Reporter: Hasan Karaoğlu
>
> Hi, I extract texts from pdf by below command. But it doesnt extract images. 
> And So, I use extract images command. But how can we merge these two data 
> sequentially?
> Extract Texts: (First command)
> {code:java}
> java -jar pdfbox.jar ExtractText -html {{inputFileName}} -startPage 
> {{startPage}} -endPage {{endPage}} -encoding UTF-8  {{outputFileName}}
> {code}
> Extract Images: (Second command)
> {code:java}
> java -jar pdfbox-app.jar ExtractImages [OPTIONS] <inputfile>
> {code}
> For example I run first command and I have a output.html file. But this file 
> has just text parts of page. There is no image. And I run second command , I 
> get  image as file. Then, How can I merge these two seperated files. Order of 
> elements in page is important. 
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-3926) ExtractImages

Reply via email to