Petras created PDFBOX-2810:
------------------------------
Summary: Indirect object marked as direct by PDFParser
Key: PDFBOX-2810
URL: https://issues.apache.org/jira/browse/PDFBOX-2810
Project: PDFBox
Issue Type: Bug
Components: Parsing
Affects Versions: 1.8.9
Reporter: Petras
I've noticed an issue with PDFParser, which marks COSObject of indirect
reference to COSDictionary as "direct", while dereferenced object
(COSDictionary) correctly indicate the indirect state.
Consider this extract from PDF:
{code}
1 0 obj
<<
/Type /Catalog
/Outlines 2 0 R
/Pages 3 0 R
>>
endobj
2 0 obj
<<
/Type /Outlines
/Count 0
>>
endobj
3 0 obj
<<
/Type /Pages
/Kids [4 0 R]
/Count 1
>>
endobj
4 0 obj
<<
/Type /Page
...
>>
endobj
{code}
Reading catalog dictionary entry "{{/Outlines 2 0 R}}":
{code}
final COSDictionary cosCatalog = catalog.getCOSDictionary();
// WORKS with dereferencing
final COSBase dictOutlines = cosCatalog.getDictionaryObject(COSName.OUTLINES);
Assert.assertFalse("Expected /Outlines indirect", dictOutlines.isDirect());
{code}
{color:red} FAILS without dereferencing{color}
{code}
Assert.assertFalse("Expected /Outlines indirect",
cosCatalog.getItem(COSName.OUTLINES).isDirect());
{code}
The culprit is code in
{{org.apache.pdfbox.pdfparser.BaseParser#parseCOSDictionary}}, which always set
COSObject containing COSDictionary as "direct".
Also noticed, that when indirect COSObject is member of COSArray, its "direct"
state is not changed to direct. This code works while reading array element
with of without dereferencing:
{code}
// /Pages 3 0 R
final COSDictionary dictPages = (COSDictionary)
cosCatalog.getDictionaryObject(COSName.PAGES);
// /Kids [4 0 R]
final COSBase objKids = dictPages.getDictionaryObject(COSName.KIDS);
// WORKS without dereference
COSBase firsElement = ((COSArray) objKids).get(0);
Assert.assertFalse("Expected /Kids array element is indirect object",
firsElement.isDirect());
// WORKS with dereference
firsElement = ((COSArray) objKids).getObject(0);
Assert.assertFalse("Expected /Kids array element is indirect object",
firsElement.isDirect());
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]