[ 
https://issues.apache.org/jira/browse/PDFBOX-2350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14147713#comment-14147713
 ] 

Daniel Scheibe commented on PDFBOX-2350:
----------------------------------------

Did a bit more of testing and if we want to workaround this issue (i.e.: be 
fault tolerant like other parsers) i would propose the following fix (which 
needs to be tested carefully):

In {{PDType1Font.java}} below the lines of:

{code}
COSStream stream = fontFile.getStream();
int length = stream.getInt(COSName.LENGTH);
int length1 = stream.getInt(COSName.LENGTH1);
int length2 = stream.getInt(COSName.LENGTH2);

// the PFB embedded as two segments back-to-back
byte[] bytes = fontFile.getByteArray();
{code}

i would suggest to insert a call to my "workaround" function:

{code}
length1 = adjustInvalidStreamLength1IfNecessary(bytes, length1);
{code}

where the function is defined as:

{code}
private int adjustInvalidStreamLength1IfNecessary(byte[] bytes, int length1)
{
    // grab first segment from bytes indicated by length1
    final byte[] segment1 = Arrays.copyOfRange(bytes, 0, length1);
    final byte[] marker =
    {
       'e', 'x', 'e', 'c'
    };

    // determine earliest possible offset of marker
    int offset = length1 - 4;
    offset = offset < 0 ? 0 : offset;

    // no we scan backwards from the end of the first segment to determine
    // if there is the 'exec' marker available
    while (offset > 0)
    {
        if (segment1[offset] == marker[0] && segment1[offset + 1] == marker[1] 
&& segment1[offset + 2] == marker[2] && segment1[offset + 3] == marker[3])
        {
            // we hit the exec marker, there might be additional cr/lf 
characters which we skip
            offset += 4;
            while (offset < length1 && (segment1[offset] == 13 || 
segment1[offset] == 10))
            {
                offset++;
            }

            break;
        }

        offset--;
    }

    // let's check if the originating length1 value is different to our newly 
determined offset
    // for safety we also check that the offset is not zero as it is very 
unlikely to be the correct value
    // if so, we warn and return the corrected value (which hopefully is 100% 
correct)
    if (length1 - offset != 0 && offset > 0)
    {
        LOG.warn("The length1 of " + length1 + " reported by the stream header 
seems to be incorrect (off by " + 
                          (length1 - offset) + " byte(s)), adjusted to " + 
offset);

        return offset;
    }

    // no discrepancy between length1 and offset, so the header seems to be 
correct already
    return length1;
}
{code}

> Type1 Parser hangs indefinitely
> -------------------------------
>
>                 Key: PDFBOX-2350
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2350
>             Project: PDFBox
>          Issue Type: Bug
>          Components: FontBox
>    Affects Versions: 2.0.0
>         Environment: Windows 7, JDK 1.7.0_51-b13
>            Reporter: Daniel Scheibe
>         Attachments: PDFBOX-2350-289451-endless.pdf
>
>
> When rendering the first page of my pdf document the Type1Parser 
> (org.apache.fontbox.type1.Type1Parser) hangs in a loop in 
> {{parseBinary(byte[] bytes) throws IOException}}
> and "kills" our rendering pipeline. Please find the loop that hangs below:
>         // find /Private dict
>         while (!lexer.peekToken().getText().equals("Private"))
>         {
>             lexer.nextToken();
>         }
> There is no token named "Private" ever in the list of returned tokens 
> (they're empty all the time).  
> Furthermore going deeper into the source code it seems the class reading the 
> tokens (Type1Lexer) does never finally advance the buffer position and always 
> returns an empty name token in the readToken(Token prevToken) method.
> Looking at the decrypted buffer i cannot get something useful out of it based 
> on my current understanding.
> Unfortunately i cannot provide the pdf in question as it contains confidental 
> data.
> Acrobat Reader XI Version 11.0.08 renders the document just fine.
> In addition it seems the pdf was encrypted (40-Bit RC4) with an empty 
> password and says it's pdf version 1.5.
> Does this provide enough information or can i do anything else to help 
> nailing this one down?
> I guess this might be a pdf document structure/feature that is not yet 
> supported completely but at least pdfbox should throw an exception instead of 
> failing "silently"...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to