I'm trying to write a conforming parser, which should help out with 
various issues, and I'm hoping that someone can help me understand the PDF 
spec so I can get this done exactly to the specifications.

I noticed in 7.5.5 of ISO 32000-1:2008 it says that the startxref location 
is the byte-offset from "the decoded stream".  This seems strange that it 
would be the *decoded* position if the first thing to do is to skip to the 
end of the file and read the EOF flag, xref location and trailer info. 
Does this mean that the expected process would be to read and decode the 
entire stream and write it to a temp file (or hold it in memory) before 
skipping to the end, reading the EOF flag, etc.?

If this is correct, I'll just read in the File/InputStream/URL/URI/etc. 
and decode/write it to a RandomAccess object.  This should keep memory 
usage low since I'll be working off the RandomAccess object, so a 500MB 
PDF won't require 500MB of memory (and I have dealt with PDFs this large).

Finally, as a test, I ran WriteDecodedDoc on my test document and then I 
expected the xref table to match up, but it still wasn't pointing to the 
location I expected.  Is there any existing code in PDFBox which would 
help me read/decode/write a PDF?

Any other suggestions, words of warning, etc.?  Like, how should I deal 
with violations of the spec?  Log and ignore, throw exception, have an 
object which deals with exceptions on a case-by-case basis?  It'd be 
pretty cool to have an object which would be smart enough to look and see 
"Read: '%%EO'; Expected: '%%EOF'" and not throw an exception, but if it 
were "Read: 'obj 49 0'; Expected: '%%EOF'" it might throw an exception. 
But I'm not going to go through the work of doing all that unless people 
will actually find it useful.  Maybe the conforming PDF parser could just 
throw an exception for non-conforming documents and then fall back to the 
PDFParser?  I'm looking for input from the community here.  Let me know 
what you think.

---- 
Thanks,
Adam



- FHA 203b; 203k; HECM; VA; USDA; Conventional 
- Warehouse Lines; FHA-Authorized Originators 
- Lending and Servicing in over 45 States 
www.swmc.com   -  www.simplehecmcalculator.com   
Visit  www.swmc.com/resources   for helpful links on Training, Webinars, Lender 
Alerts and Submitting Conditions  

This email and any content within or attached hereto from Sun West Mortgage 
Company, Inc. is confidential and/or legally privileged. The information is 
intended only for the use of the individual or entity named on this email. If 
you are not the intended recipient, you are hereby notified that any 
disclosure, copying, distribution or taking any action in reliance on the 
contents of this email information is strictly prohibited, and that the 
documents should be returned to this office immediately by email. Receipt by 
anyone other than the intended recipient is not a waiver of any privilege. 
Please do not include your social security number, account number, or any other 
personal or financial information in the content of the email. Should you have 
any questions, please call (800) 453 7884.  

Reply via email to