15.03.2011 18:16, [email protected]:
Does the patch from PDFBOX-908[1] fix this?  I reviewed that patch a while
ago but didn't have time to test it myself.  I don't normally commit
things without checking them myself, but if you can confirm that works,
I'll get it committed to the trunk.

[1] https://issues.apache.org/jira/browse/PDFBOX-908

No, it does not. The problem is in PDFParser and 908 deals only with 'endobj' and object start. My bug report (and fix) deals with %%EOF handling which currently is broken (however in most cases it does no harm since the information if we read %%EOF is only used to decide if an exception is thrown).

PDFBOX-908 seems to be applied already (at least partially). It might suffer from the same problem as I've reported in PDFBOX-978[1] - a lost newline after unreading.

[1] https://issues.apache.org/jira/browse/PDFBOX-978


Timo


From:
"Timo Boehme (JIRA)"<[email protected]>
To:
[email protected]
Date:
03/15/2011 02:37
Subject:
[jira] Commented: (PDFBOX-979) errors in %%EOF handling (fix included)




     [
https://issues.apache.org/jira/browse/PDFBOX-979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13006855#comment-13006855
]

Timo Boehme commented on PDFBOX-979:
------------------------------------

I have some bogus PDF files where content starts immediately after
'%%EOF':

startxref
302041
%%EOF333 0 obj<</Length 15/Root

In order to handle it like in the 'endobj' case I test if we start with
'%%EOF' and unread all following content.
New fixed version:

                 String eof = "";
                 if(!pdfSource.isEOF())
                     eof = readLine(); // if there's more data to read, get
the EOF flag

                 // verify that EOF exists
                 if(!"%%EOF".equals(eof)) {
                                    if( eof.startsWith( "%%EOF" ) ) {
                                                  // content after marker
->  unread with first space byte for read newline
                                                  pdfSource.unread(
SPACE_BYTE );            // we read a whole line; add space as newline
replacement
                                                  pdfSource.unread(
eof.substring( 5 ).getBytes("ISO-8859-1") );
                                    } else {
                                      // PDF does not conform to spec, we
should warn someone
                                      log.warn("expected='%%EOF' actual='"
+ eof + "'");
                                      // if we're not at the end of a file,
just put it back and move on
                                      if(!pdfSource.isEOF()) {
                                                  pdfSource.unread(
SPACE_BYTE );            // we read a whole line; add space as newline
replacement
  pdfSource.unread(eof.getBytes("ISO-8859-1"));
                                      }
                                    }
                 }


errors in %%EOF handling (fix included)
---------------------------------------

                 Key: PDFBOX-979
                 URL: https://issues.apache.org/jira/browse/PDFBOX-979
             Project: PDFBox
          Issue Type: Bug
          Components: Parsing
    Affects Versions: 1.6.0
            Reporter: Timo Boehme

The '%%EOF' handling in PDFParser has several errors. The current
implementation (start from line 467):
                 String eof = "";
                 if(!pdfSource.isEOF())
                     readLine(); // if there's more data to read, get the
EOF flag

                 // verify that EOF exists
                 if("%%EOF".equals(eof)) {
                     // PDF does not conform to spec, we should warn
someone
                     log.warn("expected='%%EOF' actual='" + eof + "'");
                     // if we're not at the end of a file, just put it
back and move on
                     if(!pdfSource.isEOF())
                         pdfSource.unread(eof.getBytes("ISO-8859-1"));
                 }
The problems:
- eof variable gets no value
- comparison if("%%EOF".equals(eof)) must be negated
- unreading must first add a newline or space byte because we read with
readline() (like in bug PDFBOX-978)
Corrected version:
                 String eof = "";
                 if(!pdfSource.isEOF())
                     eof = readLine(); // if there's more data to read,
get the EOF flag

                 // verify that EOF exists
                 if(!"%%EOF".equals(eof)) {
                     // PDF does not conform to spec, we should warn
someone
                     log.warn("expected='%%EOF' actual='" + eof + "'");
                     // if we're not at the end of a file, just put it
back and move on
                     if(!pdfSource.isEOF()) {
                                        pdfSource.unread( SPACE_BYTE );
// we read a whole line; add space as newline replacement
                         pdfSource.unread(eof.getBytes("ISO-8859-1"));
                     }
                 }

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira





- FHA 203b; 203k; HECM; VA; USDA; Conventional
- Warehouse Lines; FHA-Authorized Originators
- Lending and Servicing in over 45 States
www.swmc.com   -  www.simplehecmcalculator.com
Visit  www.swmc.com/resources   for helpful links on Training, Webinars, Lender 
Alerts and Submitting Conditions

This email and any content within or attached hereto from Sun West Mortgage 
Company, Inc. is confidential and/or legally privileged. The information is 
intended only for the use of the individual or entity named on this email. If 
you are not the intended recipient, you are hereby notified that any 
disclosure, copying, distribution or taking any action in reliance on the 
contents of this email information is strictly prohibited, and that the 
documents should be returned to this office immediately by email. Receipt by 
anyone other than the intended recipient is not a waiver of any privilege. 
Please do not include your social security number, account number, or any other 
personal or financial information in the content of the email. Should you have 
any questions, please call (800) 453 7884.


--

 Timo Boehme
 OntoChem GmbH
 H.-Damerow-Str. 4
 06120 Halle/Saale
 T: +49 345 4780472
 F: +49 345 4780471
 [email protected]

_____________________________________________________________________

 OntoChem GmbH
 Geschäftsführer: Dr. Lutz Weber
 Sitz: Halle / Saale
 Registergericht: Stendal
 Registernummer: HRB 215461
_____________________________________________________________________

Reply via email to