Conforming parser
-----------------

                 Key: PDFBOX-1000
                 URL: https://issues.apache.org/jira/browse/PDFBOX-1000
             Project: PDFBox
          Issue Type: New Feature
          Components: Parsing
            Reporter: Adam Nichols
            Assignee: Adam Nichols


A conforming parser will start at the end of the file and read backward until 
it has read the EOF marker, the xref location, and trailer[1].  Once this is 
read, it will read in the xref table so it can locate other objects and 
revisions.  This also allows skipping objects which have been rendered obsolete 
(per the xref table)[2].  It also allows the minimum amount of information to 
be read when the file is loaded, and then subsequent information will be loaded 
if and when it is requested.  This is all laid out in the official PDF 
specification, ISO 32000-1:2008.

Existing code will be re-used where possible, but this will require new classes 
in order to accommodate the lazy reading which is a very different paradigm 
from the existing parser.  Using separate classes will also eliminate the 
possibility of regression bugs from making their way into the PDDocument or 
BaseParser classes.  Changes to existing classes will be kept to a minimum in 
order to prevent regression bugs.

[1] Section 7.5.5 "Conforming readers should read a PDF file from its end"
[2] Section 7.5.4 "the entire file need not be read to locate any particular 
object"

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to