Hmm ... yeah, I'm afraid you do need an account. On Mon, Aug 29, 2011 at 5:24 PM, Stefan Mücke <[email protected]> wrote:
> > License considerations severely limit what we can do with patches > > provided via the mailing list. > > > > Would you please create an issue at > > https://issues.apache.org/jira/browse/PDFBOX ? When you attach your > > patch files, please check the box that grants us the right to use the > > files. > > Okay, but I have trouble finding an "Add/New/Create/Report issue" button. > Do I need to have an account? I don't really want to create one. > > Stefan > > > > May sound silly ... but ... it keeps everything legal! > > > > Thanks! > > > > Daniel > > > > On Sat, Aug 27, 2011 at 5:04 PM, Stefan Mücke <[email protected]> > wrote: > > > > > Hi PDFBox comitters, > > > > > > I would like to contribute a bug fix for a long-standing, major problem > > > in PDFBox. > > > > > > PDFBox uses a scratch file to reduce memory consumption. However, there > > > is no mechanism that prevents two PDStreams from writing to the > > > scratch file at the same time. When this happens, the resulting PDF > > > contains garbage in some streams. This problem occurred to me several > > > times (e.g. when writing to an image stream while constructing a > > >page). > > > Reproducing the bug > > > ******************* > > > > > > One can easily reproduce the bug. Open file AddImageToPDF.java and move > > > the following line: > > > > > > PDPageContentStream contentStream = > > > new PDPageContentStream(doc, page, true, true); > > > > > > immediately after the line in which the PDPage object is fetched: > > > > > > PDPage page = > > > (PDPage)doc.getDocumentCatalog().getAllPages().get( 0 ); > > > > > > With this modification, one will still get a PDF file, but Acrobat > > > Reader will report that the image could not be processed. BTW, the > > > files AddImageToPDF.java and ImageToPDF.java are almost identical. One > > > of them should be deleted. > > > > > > Bug-Fix > > > ******* > > > > > > The problem can be solved by using a scratch file that is divided into > > > pages (e.g. of 4 KB). Each PDStream in the scratch file is then > > > associated with a list of pages. This list grows as more data is > > >written to the stream. > > > The bug fix requires minimal changes to the existing code. The very > > > nice RandomAccess interface made this very easy. > > > > > > Here is what needs to be changed: > > > > > > - Add the attached "PagedMultiRandomAccessFile.java" to the I/O > > > package - Change COSDocument.getScratchFile() to return a > > > RandomAccess instance provided by PagedMultiRandomAccessFile: > > > > > > private PagedMultiRandomAccessFile scratchFile = null; > > > > > > [...] > > > > > > public COSDocument(File scratchDir) throws IOException { > > > tmpFile = File.createTempFile("pdfbox", "tmp", > > > scratchDir); scratchFile = new > > > PagedMultiRandomAccessFile( new > > > RandomAccessFile(tmpFile, "rw")); } > > > > > > public COSDocument(RandomAccess file) { > > > // scratchFile = file; > > > throw new RuntimeException("Not yet implemented."); > > > //$NON-NLS-1$ > > > } > > > > > > [...] > > > > > > /** > > > * Returns a new scratch file. > > > * > > > * @return the newly created scratch file > > > */ > > > public RandomAccess getScratchFile() { > > > return scratchFile.getNewRandomAcess(); > > > } > > > > > > One of the COSDocument constructors takes a RandomAccess file. This > > > constructor is only called in a single location, namely, in method > > > PDFParser.parse(). I am not sure if the RandomAccess parameter provided > > > here is really a scratch file. Someone will have to decide what to do > > > with this one. > > > > > > The code has been throughly tested and has been used in the production > > > of several books without any problems. > > > > > > In the attachment please find the code. There is also a JUnit test that > > > was used to debug my code. I have added an Apache license header and > > > adopted PDFBox's code style. Feel free to make any desired changes. > > > > > > Best regards, > > > > > > Stefan Mücke > > > > > > > > > > > > >
