I'm trying to write a conforming parser, which should help out with various issues, and I'm hoping that someone can help me understand the PDF spec so I can get this done exactly to the specifications.
I noticed in 7.5.5 of ISO 32000-1:2008 it says that the startxref location is the byte-offset from "the decoded stream". This seems strange that it would be the *decoded* position if the first thing to do is to skip to the end of the file and read the EOF flag, xref location and trailer info. Does this mean that the expected process would be to read and decode the entire stream and write it to a temp file (or hold it in memory) before skipping to the end, reading the EOF flag, etc.? If this is correct, I'll just read in the File/InputStream/URL/URI/etc. and decode/write it to a RandomAccess object. This should keep memory usage low since I'll be working off the RandomAccess object, so a 500MB PDF won't require 500MB of memory (and I have dealt with PDFs this large). Finally, as a test, I ran WriteDecodedDoc on my test document and then I expected the xref table to match up, but it still wasn't pointing to the location I expected. Is there any existing code in PDFBox which would help me read/decode/write a PDF? Any other suggestions, words of warning, etc.? Like, how should I deal with violations of the spec? Log and ignore, throw exception, have an object which deals with exceptions on a case-by-case basis? It'd be pretty cool to have an object which would be smart enough to look and see "Read: '%%EO'; Expected: '%%EOF'" and not throw an exception, but if it were "Read: 'obj 49 0'; Expected: '%%EOF'" it might throw an exception. But I'm not going to go through the work of doing all that unless people will actually find it useful. Maybe the conforming PDF parser could just throw an exception for non-conforming documents and then fall back to the PDFParser? I'm looking for input from the community here. Let me know what you think. ---- Thanks, Adam - FHA 203b; 203k; HECM; VA; USDA; Conventional - Warehouse Lines; FHA-Authorized Originators - Lending and Servicing in over 45 States www.swmc.com - www.simplehecmcalculator.com Visit www.swmc.com/resources for helpful links on Training, Webinars, Lender Alerts and Submitting Conditions This email and any content within or attached hereto from Sun West Mortgage Company, Inc. is confidential and/or legally privileged. The information is intended only for the use of the individual or entity named on this email. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or taking any action in reliance on the contents of this email information is strictly prohibited, and that the documents should be returned to this office immediately by email. Receipt by anyone other than the intended recipient is not a waiver of any privilege. Please do not include your social security number, account number, or any other personal or financial information in the content of the email. Should you have any questions, please call (800) 453 7884.
