[ 
https://issues.apache.org/jira/browse/PDFBOX-344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Lehmkühler closed PDFBOX-344.
-------------------------------------


> PushbackInputStream returns partial strings
> -------------------------------------------
>
>                 Key: PDFBOX-344
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-344
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 0.7.3
>         Environment: Mac OS X 10.5
>            Reporter: John F. Walsh
>             Fix For: 0.8.0-incubator
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> When org.pdfbox.pdfparser.BaseParser.parseDirObject() checks to see if it's 
> reading the string "false" from pdfSource, that check can fail if there's a 
> pause in the underlying read of the PDF file. 
> org.pdfbox.io.PushBackInputStream extends java.io.PushBackInputStream. 
> java.io.PushBackInputStream.read(byte[] b, int off, int len) will return a 
> string like "fals" instead of "false" if there's a pause in the read of the 
> pdf file being processed. (The PDF file that caused this problem can't be 
> shared because it contains customer data.) 
> The solution is to try the read again to read again until either enough bytes 
> have been read or an EOF has been reached, in which case the read files 
> should be returned. Adding the function override, below, to 
> org.pdfbox.io.PushBackInputStream fixes the problem.
> I rated this bug Major because, though it's a show stopper when it happens, I 
> suspect it's quite rare. But, in a production system, it matters.
> -------------------------------------
>     /**
>      * Reads up to <code>len</code> bytes of data from this input stream into
>      * an array of bytes.  This method first reads any pushed-back bytes; 
> after
>      * that, if fewer than <code>len</code> bytes have been read then it
>      * reads from the underlying input stream.  This method blocks until the
>      * requested number of bytes have been read, or until the end of the 
> stream
>      * has been reached in which case it returns the number of bytes actually 
>      * read, or -1 if zero bytes were read.
>      * 
>      * This overridden function enables 
> <tt>org.pdfbox.pdfparser.BaseParser</tt>
>      * to be assured that it has the entire string it's checking for 
> (typically
>      * "true" or "false" instead of returning a part of the string due to a 
>      * pause in the underlying stream read.
>      *
>      * @param      b     the buffer into which the data is read.
>      * @param      off   the start offset of the data.
>      * @param      len   the maximum number of bytes read.
>      * @return     the total number of bytes read into the buffer, or
>      *             <code>-1</code> if there is no more data because the end of
>      *             the stream has been reached.
>      * @exception  IOException  if an I/O error occurs.
>      * @see        java.io.PushbackInputStream#read(byte[], int, int)
>      */
>     public int read(byte[] b, int off, int len) throws IOException {
>         int bytesRead = super.read(b, off, len);
>         /* if we received the expected number of bytes, or an EOF, return 
> what we got: */
>         if ((bytesRead == len) || (bytesRead == -1)){
>             return bytesRead;
>         }
>         
>         int byteRead = 0;
>         while (bytesRead < len){
>             /* if we're missing some bytes, read them one at a time
>                 until we have the required number or an EOF is read. */
>             byteRead = super.read();
>             if (byteRead == -1){
>                 /* If it's an EOF, return what we got and report the EOF
>                     on the next read: */
>                 return bytesRead;
>             }
>             /* Add the byte to the array and loop. */
>             b[bytesRead] = (byte)byteRead;
>             bytesRead++;
>         }
>         /* Report the full read complete: */
>         return bytesRead;
>     }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to