On 1/9/15, 12:25 AM, "John Hewson" <j...@jahewson.com> wrote:
>We have some support for incremental update in PDFBox already, but I >don’t see any reason why that should be limited by sharing objects. A hash >map of COS objects in COSDocument is sufficient to track any update >state specific to an individual COS object in a given document and has the >added benefit of keeping document state out of COS object classes. Such an implementation can work for simple things, but at some point you will run into it’s limitations. But since it’s working now and no one is banging on your door to fix… >Alternatively, should we wish to store document state inside COS objects, >then we would have all the information necessary to generate a meaningful >error should an incremental update be attempted on a COS object which >belongs to another document. In this case the solution is for the user to >clone() the relevant COS object - this feels natural. Yup - that makes sense and sounds like a good solution. >PDFBox doesn’t store the object number and revision in it’s COS object >classes, so that’s not a problem for us. These numbers are instead stored >in a hash map inside COSDocument. That means that each COS object is >independent of a specific COSDocument, with the exception of the >backing stream for a COSStream. I realise that this might be unusual. It’s all about design requirements. So far you haven’t had a requirement that has required you to be able to navigate the object model in such as way that this design has failed you. In other implementations, such as Acrobat/Reader, it wouldn’t work. >Currently we don’t do on-demand decryption, but if we did, then the >backing stream which is passed to COSStream could handle this. That would work for streams, but not for strings. I suspect that today you decrypt each string as it is read and the in-memory representation of such is always un-encrypted. You’d need to add this to strings as well if you wanted to enable this feature in the future. (admittedly probably not a requirement for server-side solutions, but a huge deal for desktop and mobile!) >No, because the data as been erased. Calling close() on a COSDocument >loops through a hash map of every COS object from that document and >clears its contents. We’re in the process of figuring out why exactly that >is and if it is necessary for objects other than COSStream. Given your model, I agree that clearing the contents doesn’t seem like the right thing to do. But it would be useful to have a flag on the object about the owning document being closed. >What I’m proposing is a fairly unexciting change to COSDocument’s close() >method, but it’s yielded a useful discussion - assuming that we’re now all >on the same page :) > Excellent discussion. Appreciate your taking the time to explain some of PDFBox’s inner workings to me. Leonard