[jira] [Updated] (PDFBOX-2293) NonSequential parser gives an error

Tilman Hausherr (JIRA) Wed, 27 Aug 2014 22:38:53 -0700

     [ 
https://issues.apache.org/jira/browse/PDFBOX-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Tilman Hausherr updated PDFBOX-2293:
------------------------------------

    Description: 
I get the following error when using the sequential parse with Pdfbox 1.8.5.
{code}
expected='endstream' actual='' 
org.apache.pdfbox.io.PushBackInputStream@eb43bd5: java.io.IOException:  at 
org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(BaseParser.java:628) 
[pdfbox-1.8.5.jar:]
        at 
org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:605) 
[pdfbox-1.8.5.jar:]
        at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:194) 
[pdfbox-1.8.5.jar:]
        at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1220) 
[pdfbox-1.8.5.jar:]
        at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1187) 
[pdfbox-1.8.5.jar:]
        at 
org.apache.pdfbox.util.PDFMergerUtility.mergeDocuments(PDFMergerUtility.java:236)
 [pdfbox-1.8.5.jar:]
        at 
org.apache.pdfbox.util.PDFMergerUtility.mergeDocuments(PDFMergerUtility.java:185)
 [pdfbox-1.8.5.jar:]
{code}

After looking at some of the fixed issues reported for similar problem(s), I 
have tried using PDFBox 2.0.0 built from the latest repository code and the 
nonsequential parser for the pdf processing. However, the file created as 
randomAccessFile  seems to  get damaged (cannot be opened in Acrobat Reader 
after the run) when I use PDFbox 2.0.0  for my processing. 
I am unable to attach a sample file because of privacy concerns for the 
content. I also get an error and am not able to generate the merged output. 
The code snippet is as follows-
{code}
for (String fName : fileList) {
        pd = null;
        File pdFile = new File(fName);
        fNameStr = fName.substring(0, fName.lastIndexOf('.'))
                                        + "_new.pdf";

        InputStream is = new FileInputStream(pdFile);
        RandomAccessFile raf = new RandomAccessFile(pdFileNew, "rws");
                        pd = PDDocument.loadNonSeq(is, raf );
        pd.getDocumentCatalog();
        pd.save(fNameStr);
        pd.close();
        if (is != null) {
           is.close();
        }
        if(raf != null) {
          raf.close();
        }

        ut.addSource(fNameStr);
}
FileOutputStream fos = new FileOutputStream(outFileName);
ut.setDestinationStream(fos);
ut.setIgnoreAcroFormErrors(true);
ut.mergeDocuments();
fos.close();
{code}

Thank You.

  was:
I get the following error when using the sequential parse with Pdfbox 1.8.5.
expected='endstream' actual='' 
org.apache.pdfbox.io.PushBackInputStream@eb43bd5: java.io.IOException:  at 
org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(BaseParser.java:628) 
[pdfbox-1.8.5.jar:]
        at 
org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:605) 
[pdfbox-1.8.5.jar:]
        at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:194) 
[pdfbox-1.8.5.jar:]
        at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1220) 
[pdfbox-1.8.5.jar:]
        at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1187) 
[pdfbox-1.8.5.jar:]
        at 
org.apache.pdfbox.util.PDFMergerUtility.mergeDocuments(PDFMergerUtility.java:236)
 [pdfbox-1.8.5.jar:]
        at 
org.apache.pdfbox.util.PDFMergerUtility.mergeDocuments(PDFMergerUtility.java:185)
 [pdfbox-1.8.5.jar:]

After looking at some of the fixed issues reported for similar problem(s), I 
have tried using PDFBox 2.0.0 built from the latest repository code and the 
nonsequential parser for the pdf processing. However, the file created as 
randomAccessFile  seems to  get damaged (cannot be opened in Acrobat Reader 
after the run) when I use PDFbox 2.0.0  for my processing. 
I am unable to attach a sample file because of privacy concerns for the 
content. I also get an error and am not able to generate the merged output. 
The code snippet is as follows-
for (String fName : fileList) {
        pd = null;
        File pdFile = new File(fName);
        fNameStr = fName.substring(0, fName.lastIndexOf('.'))
                                        + "_new.pdf";

        InputStream is = new FileInputStream(pdFile);
        RandomAccessFile raf = new RandomAccessFile(pdFileNew, "rws");
                        pd = PDDocument.loadNonSeq(is, raf );
        pd.getDocumentCatalog();
        pd.save(fNameStr);
        pd.close();
        if (is != null) {
           is.close();
        }
        if(raf != null) {
          raf.close();
        }

        ut.addSource(fNameStr);
}
FileOutputStream fos = new FileOutputStream(outFileName);
ut.setDestinationStream(fos);
ut.setIgnoreAcroFormErrors(true);
ut.mergeDocuments();
fos.close();

Thank You.


> NonSequential parser gives an error
> -----------------------------------
>
>                 Key: PDFBOX-2293
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2293
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 2.0.0
>         Environment: Linux, JDK 1.6
>            Reporter: v gangolli
>
> I get the following error when using the sequential parse with Pdfbox 1.8.5.
> {code}
> expected='endstream' actual='' 
> org.apache.pdfbox.io.PushBackInputStream@eb43bd5: java.io.IOException:  at 
> org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(BaseParser.java:628) 
> [pdfbox-1.8.5.jar:]
>         at 
> org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:605) 
> [pdfbox-1.8.5.jar:]
>         at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:194) 
> [pdfbox-1.8.5.jar:]
>         at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1220) 
> [pdfbox-1.8.5.jar:]
>         at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1187) 
> [pdfbox-1.8.5.jar:]
>         at 
> org.apache.pdfbox.util.PDFMergerUtility.mergeDocuments(PDFMergerUtility.java:236)
>  [pdfbox-1.8.5.jar:]
>         at 
> org.apache.pdfbox.util.PDFMergerUtility.mergeDocuments(PDFMergerUtility.java:185)
>  [pdfbox-1.8.5.jar:]
> {code}
> After looking at some of the fixed issues reported for similar problem(s), I 
> have tried using PDFBox 2.0.0 built from the latest repository code and the 
> nonsequential parser for the pdf processing. However, the file created as 
> randomAccessFile  seems to  get damaged (cannot be opened in Acrobat Reader 
> after the run) when I use PDFbox 2.0.0  for my processing. 
> I am unable to attach a sample file because of privacy concerns for the 
> content. I also get an error and am not able to generate the merged output. 
> The code snippet is as follows-
> {code}
> for (String fName : fileList) {
>       pd = null;
>         File pdFile = new File(fName);
>       fNameStr = fName.substring(0, fName.lastIndexOf('.'))
>                                       + "_new.pdf";
>       InputStream is = new FileInputStream(pdFile);
>         RandomAccessFile raf = new RandomAccessFile(pdFileNew, "rws");
>                       pd = PDDocument.loadNonSeq(is, raf );
>         pd.getDocumentCatalog();
>         pd.save(fNameStr);
>         pd.close();
>       if (is != null) {
>          is.close();
>       }
>       if(raf != null) {
>         raf.close();
>       }
>       ut.addSource(fNameStr);
> }
> FileOutputStream fos = new FileOutputStream(outFileName);
> ut.setDestinationStream(fos);
> ut.setIgnoreAcroFormErrors(true);
> ut.mergeDocuments();
> fos.close();
> {code}
> Thank You.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (PDFBOX-2293) NonSequential parser gives an error

Reply via email to