I'm trying to find some solution to the problem of documents which have multiple objects with the same object ID and revision number. I have some documents which cause NPE and hence the documents can not be merged. I realize these are out of spec, but when the files are opened with Adobe Reader, they are rendered just fine. So (non-technical) people figure if Adobe Reader can read it, why can't our software deal with it?
I found some code in COSObject::setObject() which seems to take a crack at solving this, but it's all commented out. I uncommented it hoping it would magically solve my problems, but there was no such luck. Does anyone know who wrote that code so I can collaborate with them (SVN history didn't have anything)? According to Neil[1], the best thing to do would be to rewrite the parser. I'm not beyond rewriting the parser if that will solve my issue. But I need to understand how it currently works and how it should work before I can take on something like that. I noticed that section 7.5.5 (File Trailer) of the PDF spec says "Conforming readers should read a PDF file from its end." and I'm pretty sure PDFParser::parse() doesn't do that. Anyone think looking at the COSObject will be any faster than rewriting the parser? The documents I have are all confidential, so unfortunately I can't share them, but there are some other[1] issues[2] which seem to be somewhat related. I'm going to keep looking for a file I can get approved for release so I can upload it to JIRA with an exact stacktrace and everything. [1] https://issues.apache.org/jira/browse/PDFBOX-569 [2] https://issues.apache.org/jira/browse/PDFBOX-720 ---- Thanks, Adam ? Click here to submit conditions This email and any content within or attached hereto from Sun West Mortgage Company, Inc. is confidential and/or legally privileged. The information is intended only for the use of the individual or entity named on this email. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or the taking of any action in reliance on the contents of this email information is strictly prohibited, and that the documents should be returned to this office immediately by email. Receipt by anyone other than the intended recipient is not a waiver of any privilege. Please do not include your social security number, account number, or any other personal or financial information in the content of the email. Should you have any questions, please call (800) 453 7884.
