[
https://issues.apache.org/jira/browse/PDFBOX-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12862365#action_12862365
]
[email protected] commented on PDFBOX-276:
----------------------------------------------
Andreas,
I just realized that there is another case which is not dealt with with my
patch.
If the unbalanced parentheses issue happens on the last line in the object
block, then the character to test for should be a '>" as well as a '/'
But we don't have a test case for this, (nextThreeBytes[1] == 0x2f ||
nextThreeBytes[1] == 0x3e ))) // Look for a slash / or a >
Examples of possible issues:
/CreationDate (1/8/2003 12:18:53 \)
>>
or
/CreationDate ((1/8/2003 12:18:53 )
>>
It would mean adding the following change to the patch.
nextThreeBytes[1] == 0x0a && // Look for a new line
+ (nextThreeBytes[2] == 0x2f || nextThreeBytes[1] == 0x3e
)) || // Look for a slash / or a >
+ // Add a second case without
a new line
+ (nextThreeBytes[0] == 0x0d && // Look for a carriage return
+ (nextThreeBytes[1] == 0x2f || nextThreeBytes[1] == 0x3e
))) // Look for a slash / or a >
+
Any thoughts?
Peter
> IOException on parsing a PDF file
> ---------------------------------
>
> Key: PDFBOX-276
> URL: https://issues.apache.org/jira/browse/PDFBOX-276
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing
> Priority: Minor
> Fix For: 1.2.0
>
> Attachments: BaseParser.java, pdfbox-276-baseparser-patch-938120.txt,
> PDFBOX276-NotIndexedDocument.pdf
>
>
> [imported from SourceForge]
> http://sourceforge.net/tracker/index.php?group_id=78314&atid=552832&aid=1722594
> Originally submitted by doublep-enw on 2007-05-21 05:10.
> When parsing the attached file, PDFBox throws the following exception:
> java.io.IOException: expected='/' actual='?'--1
> org.pdfbox.io.pushbackinputstr...@159f498
> at org.pdfbox.pdfparser.BaseParser.parseCOSName(BaseParser.java:774)
> at org.pdfbox.pdfparser.BaseParser.parseCOSDictionary(BaseParser.java:217)
> at org.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:910)
> at org.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:432)
> at org.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:176)
> The file does look strange inside, but PDF viewers don't seem to care.
> [attachment on SourceForge]
> http://sourceforge.net/tracker/download.php?group_id=78314&atid=552832&aid=1722594&file_id=229983
> NotIndexedDocument.pdf (application/pdf), 8728 bytes
> unparseable file
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.