Re: Bug-Fix for Scratch File Bug

Stefan Mücke Mon, 29 Aug 2011 12:47:43 -0700

Here's the issue with the attached code:
https://issues.apache.org/jira/browse/PDFBOX-1109



> > License considerations severely limit what we can do with patches
> > provided via the mailing list.
> >
> > Would you please create an issue at
> > https://issues.apache.org/jira/browse/PDFBOX ?  When you attach your
> > patch files, please check the box that grants us the right to use the 
> > files.
>
> Okay, but I have trouble finding an "Add/New/Create/Report issue" button.
> Do I need to have an account? I don't really want to create one.
>
> Stefan
>
>
> > May sound silly ... but ... it keeps everything legal!
> >
> > Thanks!
> >
> > Daniel
> >
> > On Sat, Aug 27, 2011 at 5:04 PM, Stefan Mücke <[email protected]>
> > wrote:
> > > Hi PDFBox comitters,
> > >
> > > I would like to contribute a bug fix for a long-standing, major
> > > problem  in PDFBox.
> > >
> > > PDFBox uses a scratch file to reduce memory consumption. However, 
> > > there  is no mechanism that prevents two PDStreams from writing to
> > > the  scratch file at the same time. When this happens, the resulting
> > > PDF  contains garbage in some streams. This problem occurred to me
> > > several  times (e.g. when writing to an image stream while
> > >constructing a  page).
> > > Reproducing the bug
> > > *******************
> > >
> > > One can easily reproduce the bug. Open file AddImageToPDF.java and
> > > move  the following line:
> > >
> > >    PDPageContentStream contentStream =
> > >        new PDPageContentStream(doc, page, true, true);
> > >
> > > immediately after the line in which the PDPage object is fetched:
> > >
> > >    PDPage page =
> > >        (PDPage)doc.getDocumentCatalog().getAllPages().get( 0 );
> > >
> > > With this modification, one will still get a PDF file, but Acrobat
> > > Reader will report that the image could not be processed. BTW, the
> > > files AddImageToPDF.java and ImageToPDF.java are almost identical.
> > > One  of them should be deleted.
> > >
> > > Bug-Fix
> > > *******
> > >
> > > The problem can be solved by using a scratch file that is divided
> > > into pages (e.g. of 4 KB). Each PDStream in the scratch file is
> > > then  associated with a list of pages. This list grows as more data
> > >is  written to the stream.
> > > The bug fix requires minimal changes to the existing code. The very
> > > nice RandomAccess interface made this very easy.
> > >
> > > Here is what needs to be changed:
> > >
> > >    - Add the attached "PagedMultiRandomAccessFile.java" to the I/O
> > >    package - Change COSDocument.getScratchFile() to return a
> > >      RandomAccess instance provided by PagedMultiRandomAccessFile:
> > >
> > >        private PagedMultiRandomAccessFile scratchFile = null;
> > >
> > >        [...]
> > >
> > >        public COSDocument(File scratchDir) throws IOException {
> > >                tmpFile = File.createTempFile("pdfbox", "tmp",
> > >                scratchDir); scratchFile = new
> > >                        PagedMultiRandomAccessFile( new
> > >        RandomAccessFile(tmpFile, "rw")); }
> > >
> > >        public COSDocument(RandomAccess file) {
> > >                // scratchFile = file;
> > >                throw new RuntimeException("Not yet implemented.");
> > > //$NON-NLS-1$
> > >        }
> > >
> > >        [...]
> > >
> > >        /**
> > >         * Returns a new scratch file.
> > >         *
> > >         * @return the newly created scratch file
> > >         */
> > >        public RandomAccess getScratchFile() {
> > >                return scratchFile.getNewRandomAcess();
> > >        }
> > >
> > > One of the COSDocument constructors takes a RandomAccess file. This
> > > constructor is only called in a single location, namely, in method
> > > PDFParser.parse(). I am not sure if the RandomAccess parameter
> > > provided  here is really a scratch file. Someone will have to decide
> > > what to do  with this one.
> > >
> > > The code has been throughly tested and has been used in the
> > > production  of several books without any problems.
> > >
> > > In the attachment please find the code. There is also a JUnit test
> > > that  was used to debug my code. I have added an Apache license
> > > header and  adopted PDFBox's code style. Feel free to make any
> > >desired changes.
> > > Best regards,
> > >
> > > Stefan Mücke
> > >
> > >
> >
>
>
>
>

Re: Bug-Fix for Scratch File Bug

Reply via email to