[ 
https://issues.apache.org/jira/browse/PDFBOX-3186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr reopened PDFBOX-3186:
-------------------------------------

Parsing fails with the file of PDFBOX-2521. The problem is that your change 
allows "no space". Previously, the condition was "one space and one digit". 
Here's some code I did yesterday and didn't commit because I wasn't sure if the 
change was at the right place:
{code}
    private long checkXRefStreamOffset(long startXRefOffset, boolean checkOnly) 
throws IOException
    {
        // repair mode isn't available in non-lenient mode
        if (!isLenient || startXRefOffset == 0)
        {
            return startXRefOffset;
        }
        // seek to offset-1 
        source.seek(startXRefOffset-1);
        int nextValue = source.read();
        // the first character has to be a whitespace(s), and then a digit
        if (isWhitespace(nextValue))
        {
            skipSpaces();
            if (isDigit())
            {
                try
                {
                    // it's a XRef stream
                    readObjectNumber();
                    readGenerationNumber();
                    readExpectedString(OBJ_MARKER, true);
                    source.seek(startXRefOffset);
                    return startXRefOffset;
                }
                catch (IOException exception)
                {
                // there wasn't an object of a xref stream
                    // try to repair the offset
                    source.seek(startXRefOffset);
                }
            }
        }
        // try to find a fixed offset
        return checkOnly ? -1 : calculateXRefFixedOffset(startXRefOffset, true);
    }
{code}

> Parsing fails when XRef stream object is 1 byte later
> -----------------------------------------------------
>
>                 Key: PDFBOX-3186
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3186
>             Project: PDFBox
>          Issue Type: Sub-task
>          Components: Parsing
>    Affects Versions: 2.0.0
>            Reporter: Tilman Hausherr
>            Assignee: Andreas Lehmkühler
>             Fix For: 2.0.0
>
>         Attachments: PDFBOX-3186-2DVGHWIAXQBSMQIYJ2QM3I5EIILTVUIC.pdf, 
> PDFBOX-3186-4OMJHT6CV7IFKANYBXJRVJBVFGHI7YQ3.pdf
>
>
> The attached files don't parse properly - their only problem is that the XRef 
> object starts 1 byte after the offset mentioned at the end of the file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to