Mike Cantrell created PDFBOX-5433:
-------------------------------------
Summary: PDFStreamEngine creating new operators that do not exist
in document
Key: PDFBOX-5433
URL: https://issues.apache.org/jira/browse/PDFBOX-5433
Project: PDFBox
Issue Type: Bug
Reporter: Mike Cantrell
Attachments: pdfbox-stream-engine-operators.zip
We're using PDFStreamEngine to do some analysis and filtering (optimizations)
to the document's content streams. I've found an odd case where a form giving
us extra (unwanted) operators that don't exist in the original stream.
According to the PDFDebugger, the form's stream has the following contents:
{code:java}
0 TL
q
BT
1 0 0 rg
0 i
/TT0 20 Tf
0 Tc
0 Tw
0 Ts
100 Tz
0 Tr
0 -15.791 TD
(HOODHD035236) Tj
ET
Q{code}
I created a debug utility to output the operators given by the PDFStreamEngine
{code:java}
@Getter
static class StreamDebugger extends PDFStreamEngine {
String formName;
Operator operator;
List<COSBase> operands;
int operatorCount;
public StreamDebugger() {
addOperator(new BeginText());
addOperator(new Concatenate());
addOperator(new DrawObject()); // special text version
addOperator(new EndText());
addOperator(new SetGraphicsStateParameters());
addOperator(new Save());
addOperator(new Restore());
addOperator(new NextLine());
addOperator(new SetCharSpacing());
addOperator(new MoveText());
addOperator(new MoveTextSetLeading());
addOperator(new SetFontAndSize());
addOperator(new ShowText());
addOperator(new ShowTextAdjusted());
addOperator(new SetTextLeading());
addOperator(new SetMatrix());
addOperator(new SetTextRenderingMode());
addOperator(new SetTextRise());
addOperator(new SetWordSpacing());
addOperator(new SetTextHorizontalScaling());
addOperator(new ShowTextLine());
addOperator(new ShowTextLineAndSpace());
}
@Override
public void showForm(PDFormXObject form) throws IOException {
this.formName = ((COSName) operands.get(0)).getName();
super.showForm(form);
this.formName = null;
}
@Override
protected void processOperator(Operator operator, List<COSBase> operands)
throws IOException {
this.operator = operator;
this.operands = operands;
if (Objects.equals(this.formName, "Fm0")) {
this.operatorCount++;
System.out.printf("%s:%s%n", operator.getName(),
operands.toString());
}
super.processOperator(operator, operands);
}
} {code}
The resulting output:
{code:java}
TL:[COSInt{0}]
q:[]
BT:[]
rg:[COSInt{1}, COSInt{0}, COSInt{0}]
i:[COSInt{0}]
Tf:[COSName{TT0}, COSInt{20}]
Tc:[COSInt{0}]
Tw:[COSInt{0}]
Ts:[COSInt{0}]
Tz:[COSInt{100}]
Tr:[COSInt{0}]
TD:[COSInt{0}, COSFloat{-15.791}]
TL:[COSFloat{15.791}]
Td:[COSInt{0}, COSFloat{-15.791}]
Tj:[COSString{HOODHD035236}]
ET:[]
Q:[] {code}
These operators do not exist in the original stream:
{code:java}
TL:[COSFloat{15.791}]
Td:[COSInt{0}, COSFloat{-15.791}]{code}
If you were to re-write the stream given the operators from the engine, it
causes display issues in the resulting PDF.
I'm attaching a test case which demonstrates the issue.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]