[jira] Commented: (PDFBOX-698) Unable parse images from PDF documents concated by tex

Radim Hatlapatka (JIRA) Tue, 20 Apr 2010 15:22:13 -0700

    [ 
https://issues.apache.org/jira/browse/PDFBOX-698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12859096#action_12859096
 ]


Radim Hatlapatka commented on PDFBOX-698:
-----------------------------------------

Essential part of code that I use for extracting images from PDF (I also catch 
exceptions and close streams, but here I left it out,...) .
This code I think is alright but doesn't recognize images in PDF documents 
described in this thread.


// loading pdfFile as PDDocument
        PDDocument document = null;
        try {
            document = PDDocument.load(inputStream);

            AccessPermission accessPermissions = 
document.getCurrentAccessPermission();

            if (!accessPermissions.canExtractContent()) {
                throw new PdfRecompressionException("Error: You do not have 
permission to extract images.");
            }

            // going page by page
            List pages = document.getDocumentCatalog().getAllPages();
            for (int pageNumber = 0; pageNumber < pages.size(); pageNumber++) {
               
                PDPage page = (PDPage) pages.get(pageNumber);
                PDResources resources = page.getResources();



                // reading images from each page and saving them to file
                // (name of file is saved in list 
namSystem.err.println(images);esOfImages
                Map images = resources.getImages();
                if (images != null) {
                    Iterator imageIter = images.keySet().iterator();
                    while (imageIter.hasNext()) {
                        String key = (String) imageIter.next();
                        PDXObjectImage image = (PDXObjectImage) images.get(key);
                                        
                        String name = getUniqueFileName(prefix + key, 
image.getSuffix());
                        System.out.println("Writing image:" + name);
                        image.write2file(name);
                    }
                }
            }
      
   

> Unable parse images from PDF documents concated by tex
> ------------------------------------------------------
>
>                 Key: PDFBOX-698
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-698
>             Project: PDFBox
>          Issue Type: Bug
>          Components: PDModel
>    Affects Versions: 0.8.0-incubator, 1.0.0, 1.1.0
>         Environment: Using jdk 1.6 in Ubuntu 8.10 (using IDE netbeans 6.5)
>            Reporter: Radim Hatlapatka
>         Attachments: item.pdf
>
>
> Unable to extract images from PDF document created from another PDF documents 
> by their concatanation using tex (if concat by pdftk than it works fine, but 
> if concat by tex it doesn't find any).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PDFBOX-698) Unable parse images from PDF documents concated by tex

Reply via email to