[jira] [Commented] (PDFBOX-2293) NonSequential parser gives an error

v gangolli (JIRA) Wed, 10 Sep 2014 09:32:05 -0700

    [ 
https://issues.apache.org/jira/browse/PDFBOX-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14128688#comment-14128688
 ]


v gangolli commented on PDFBOX-2293:
------------------------------------

The documents are  confidential information and as such I cannot share.
I have switched to using 1.8.6 in stead of 2.0.0. With non-sequential parser, I 
am able to get the problem documents to merge with some warnings etc.,  
however, I see that some of the content gets lost. The document in question may 
be malformed, as you have suggested, however, pdftk is able to merge these 
documents without the content loss.
I have tried to build the source code for 1.8.6, however, I get the following 
error and am not able to build. Could you please take a look?  I am also 
looking into if I can build some sample files that can reproduce the behavior 
seen with the problem documents that pdftk is able to process successfully.

===
Failed to execute goal on project pdfbox: Could not resolve dependencies for 
project org.apache.pdfbox:pdfbox:bundle:1.8.6: Failed to collect dependencies 
for [org.apache.pdfbox:fontbox:jar:1.8.6 (compile), 
org.apache.pdfbox:jempbox:jar:1.8.6 (compile), 
commons-logging:commons-logging:jar:1.1.1 (compile), 
org.bouncycastle:bcmail-jdk15:jar:1.44 (compile?), 
org.bouncycastle:bcprov-jdk15:jar:1.44 (compile?), com.ibm.icu:icu4j:jar:3.8 
(compile?), junit:junit:jar:4.8.1 (test), 
com.levigo.jbig2:levigo-jbig2-imageio:jar:1.6.2 (test), 
net.java.dev.jai-imageio:jai-imageio-core-standalone:jar:1.2-pre-dr-b04-2011-07-04
 (test)]: Failed to read artifact descriptor for 
net.java.dev.jai-imageio:jai-imageio-core-standalone:jar:1.2-pre-dr-b04-2011-07-04:
 Could not transfer artifact 
net.java.dev.jai-imageio:jai-imageio-core-standalone:pom:1.2-pre-dr-b04-2011-07-04
 from/to mygrid-repository (http://www.mygrid.org.uk/maven/repository): Failed 
to transfer 
http://www.mygrid.org.uk/maven/repository/net/java/dev/jai-imageio/jai-imageio-core-standalone/1.2-pre-dr-b04-2011-07-04/jai-imageio-core-standalone-1.2-pre-dr-b04-2011-07-04.pom.
 Error code 503, Service Temporarily Unavailable -> [Help 1]
===

> NonSequential parser gives an error
> -----------------------------------
>
>                 Key: PDFBOX-2293
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2293
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 2.0.0
>         Environment: Linux, JDK 1.6
>            Reporter: v gangolli
>
> I get the following error when using the sequential parse with Pdfbox 1.8.5.
> {code}
> expected='endstream' actual='' 
> org.apache.pdfbox.io.PushBackInputStream@eb43bd5: java.io.IOException:  at 
> org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(BaseParser.java:628) 
> [pdfbox-1.8.5.jar:]
>         at 
> org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:605) 
> [pdfbox-1.8.5.jar:]
>         at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:194) 
> [pdfbox-1.8.5.jar:]
>         at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1220) 
> [pdfbox-1.8.5.jar:]
>         at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1187) 
> [pdfbox-1.8.5.jar:]
>         at 
> org.apache.pdfbox.util.PDFMergerUtility.mergeDocuments(PDFMergerUtility.java:236)
>  [pdfbox-1.8.5.jar:]
>         at 
> org.apache.pdfbox.util.PDFMergerUtility.mergeDocuments(PDFMergerUtility.java:185)
>  [pdfbox-1.8.5.jar:]
> {code}
> After looking at some of the fixed issues reported for similar problem(s), I 
> have tried using PDFBox 2.0.0 built from the latest repository code and the 
> nonsequential parser for the pdf processing. However, the file created as 
> randomAccessFile  seems to  get damaged (cannot be opened in Acrobat Reader 
> after the run) when I use PDFbox 2.0.0  for my processing. 
> I am unable to attach a sample file because of privacy concerns for the 
> content. I also get an error and am not able to generate the merged output. 
> The code snippet is as follows-
> {code}
> for (String fName : fileList) {
>       pd = null;
>         File pdFile = new File(fName);
>       fNameStr = fName.substring(0, fName.lastIndexOf('.'))
>                                       + "_new.pdf";
>       InputStream is = new FileInputStream(pdFile);
>         RandomAccessFile raf = new RandomAccessFile(pdFileNew, "rws");
>                       pd = PDDocument.loadNonSeq(is, raf );
>         pd.getDocumentCatalog();
>         pd.save(fNameStr);
>         pd.close();
>       if (is != null) {
>          is.close();
>       }
>       if(raf != null) {
>         raf.close();
>       }
>       ut.addSource(fNameStr);
> }
> FileOutputStream fos = new FileOutputStream(outFileName);
> ut.setDestinationStream(fos);
> ut.setIgnoreAcroFormErrors(true);
> ut.mergeDocuments();
> fos.close();
> {code}
> Thank You.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PDFBOX-2293) NonSequential parser gives an error

Reply via email to