[iText-questions] Extract Images from existing pdf - not working on every pdf

Rene Weiss Tue, 16 Sep 2008 06:28:16 -0700

Hi guys,


I have read through the mailing list and found some interesting code of
Bruno lowagie on how to extract images from existing pdfs.

I just need the image for further processing and it does work, but not
on all pdfs I have tested.

Actually I am only interested in one image of the pdf, and there is only
1 max. 2 images in the file, although there are more exported. That
isn't a problem because I'll find it easily by the size or actually
converting it internally to an Image and catching an exception which
indicates it isn't a supported image.....

 

I prepared 2 files for you  - on is working the other on isn't:

http://web46131.server46.mivitec.net/public_data/3980491.pdf - this one
isn't working

http://web46131.server46.mivitec.net/public_data/3287348.pdf - this one
is working like a charm

 

Here is the code I use to extract the images:

 

import com.lowagie.text.pdf.*;

import java.awt.*;

import java.io.File;

import java.io.FileOutputStream;

 

public class ExtractPDFImage {

       public static void main(String[] args) throws Exception {

             PdfReader reader;

 

             String path = "D:\\TEMP\\";

             int fileCount = 0;

 

             File dir = new File(path);

 

             for (File file : dir.listFiles()) {

                    if (file.isFile() && file.canRead()) {

                           reader = new
PdfReader(file.getAbsolutePath());

 

                           fileCount++;

 

                           for (int i = 0; i < reader.getXrefSize();
i++) 

                           { 

                                  PdfObject pdfobj =
reader.getPdfObject(i); 

 

                                  if (pdfobj != null) {

                                        if (pdfobj.isStream()) {

                                               PdfStream stream =
(PdfStream) pdfobj;

                                               PdfObject pdfsubtype =
stream.get(PdfName.SUBTYPE);

                                               if (pdfsubtype != null) {

                                                      // PDF Subtype OK

                                                      if
(pdfsubtype.toString().equals(PdfName.IMAGE.toString())) {

                                                            //image
found

 

                                                            byte[] img =
PdfReader.getStreamBytesRaw((PRStream) stream);

 

 
FileOutputStream out = new FileOutputStream(new File(path + "jpg\\" +
fileCount + "_" + i + ".jpg"));

 
out.write(img);

                                                            out.flush();

                                                            out.close();

                                                      }

                                               }

                                        }

                                  }

                           }

                    }

             }

 

       }

}

 

I have no clue whats the difference in those files that don't work -
because the production-process is always the same!

 

Thank you very much for you help, yours Rene

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/

_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php

[iText-questions] Extract Images from existing pdf - not working on every pdf

Reply via email to