[jira] [Updated] (PDFBOX-1510) PDF gets corrupted when extracting it from the embedded files

Andriy (JIRA) Wed, 06 Feb 2013 07:51:15 -0800

     [ 
https://issues.apache.org/jira/browse/PDFBOX-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Andriy updated PDFBOX-1510:
---------------------------

    Summary: PDF gets corrupted when extracting it from the embedded files  
(was: PDF gets corrupted when trying to extract it from the embedded files)
    
> PDF gets corrupted when extracting it from the embedded files
> -------------------------------------------------------------
>
>                 Key: PDFBOX-1510
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1510
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 1.7.1
>            Reporter: Andriy
>            Priority: Critical
>         Attachments: doesnt_work.pdf, PDFEmbeddedFiles.java, works2.pdf
>
>
> When a PDF is attached to another PDF it gets corrupted when retrieved 
> through PDEmbeddedFile.getByteArray() method call. For some reason the 
> returned array has less data than the original file that has been attached to 
> the PDF.
> This affects some of the documents and not another. Below is the test code 
> the replicates the issue.
> PDF that has an attachment that gets corrupted will be attached to the issue.
> public class PDFEmbeddedFiles {
>       private PDFEmbeddedFiles() {
>       }
>       public static void main(String[] args) throws Exception {
>               if (args.length != 1) {
>                       usage();
>                       System.exit(1);
>               } else {
>                       PDDocument document = null;
>                       try {
>                               File pdfFile = new File(args[0]);
>                               /*
>                               String filePath = pdfFile.getParent()
>                                               + 
> System.getProperty("file.separator");
>                               */
>                               document = PDDocument.load(pdfFile);
>                               if (document.isEncrypted()) {
>                                       try {
>                                               document.decrypt("");
>                                       } catch (InvalidPasswordException e) {
>                                               System.err.println("Error: The 
> document is encrypted.");
>                                       } catch 
> (org.apache.pdfbox.exceptions.CryptographyException e) {
>                                               e.printStackTrace();
>                                       }
>                               }
>                               
>                               PDDocumentNameDictionary namesDictionary = 
> document.getDocumentCatalog().getNames(); //new 
> PDDocumentNameDictionary(document.getDocumentCatalog());
>                               PDEmbeddedFilesNameTreeNode efTree = 
> namesDictionary.getEmbeddedFiles();
>                               if (efTree != null) {
>                                       Map<String, Object> names = 
> efTree.getNames();
>                                       Iterator<String> namesKeys = 
> names.keySet().iterator();
>                                       while (namesKeys.hasNext()) {
>                                               String filename = 
> namesKeys.next();
>                                               PDComplexFileSpecification 
> fileSpec = (PDComplexFileSpecification) names
>                                                               .get(filename);
>                                               PDEmbeddedFile embeddedFile = 
> fileSpec
>                                                               
> .getEmbeddedFile();
>                                               String embeddedFilename = 
> filename;//filePath + filename;
>                                               File file = new 
> File(filename);//filePath + filename);
>                                               System.out.println("Writing " + 
> embeddedFilename);
>                                               FileOutputStream fos = new 
> FileOutputStream(file);
>                                               
>                                               
> fos.write(embeddedFile.getByteArray());
>                                               fos.close();
>                                       }
>                               }
>                       } finally {
>                               if (document != null) {
>                                       document.close();
>                               }
>                       }
>               }
>       }
>       /**
>        * This will print the usage for this program.
>        */
>       private static void usage() {
>               System.err.println("Usage: java "
>                               + PDFEmbeddedFiles.class.getName() + " 
> <input-pdf>");
>       }
> }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PDFBOX-1510) PDF gets corrupted when extracting it from the embedded files

Reply via email to