Mike Cantrell created PDFBOX-5433:
-------------------------------------

             Summary: PDFStreamEngine creating new operators that do not exist 
in document
                 Key: PDFBOX-5433
                 URL: https://issues.apache.org/jira/browse/PDFBOX-5433
             Project: PDFBox
          Issue Type: Bug
            Reporter: Mike Cantrell
         Attachments: pdfbox-stream-engine-operators.zip

We're using PDFStreamEngine to do some analysis and filtering (optimizations) 
to the document's content streams. I've found an odd case where a form giving 
us extra (unwanted) operators that don't exist in the original stream.

According to the PDFDebugger, the form's stream has the following contents:

 
{code:java}
0 TL
q
  BT
    1 0 0 rg
    0 i
    /TT0 20 Tf
    0 Tc
    0 Tw
    0 Ts
    100 Tz
    0 Tr
    0 -15.791 TD
    (HOODHD035236) Tj
  ET
Q{code}
I created a debug utility to output the operators given by the PDFStreamEngine
{code:java}
@Getter
static class StreamDebugger extends PDFStreamEngine {

    String formName;
    Operator operator;
    List<COSBase> operands;
    int operatorCount;

    public StreamDebugger() {
        addOperator(new BeginText());
        addOperator(new Concatenate());
        addOperator(new DrawObject()); // special text version
        addOperator(new EndText());
        addOperator(new SetGraphicsStateParameters());
        addOperator(new Save());
        addOperator(new Restore());
        addOperator(new NextLine());
        addOperator(new SetCharSpacing());
        addOperator(new MoveText());
        addOperator(new MoveTextSetLeading());
        addOperator(new SetFontAndSize());
        addOperator(new ShowText());
        addOperator(new ShowTextAdjusted());
        addOperator(new SetTextLeading());
        addOperator(new SetMatrix());
        addOperator(new SetTextRenderingMode());
        addOperator(new SetTextRise());
        addOperator(new SetWordSpacing());
        addOperator(new SetTextHorizontalScaling());
        addOperator(new ShowTextLine());
        addOperator(new ShowTextLineAndSpace());
    }

    @Override
    public void showForm(PDFormXObject form) throws IOException {
        this.formName = ((COSName) operands.get(0)).getName();
        super.showForm(form);
        this.formName = null;
    }

    @Override
    protected void processOperator(Operator operator, List<COSBase> operands) 
throws IOException {
        this.operator = operator;
        this.operands = operands;
        if (Objects.equals(this.formName, "Fm0")) {
            this.operatorCount++;
            System.out.printf("%s:%s%n", operator.getName(), 
operands.toString());
        }
        super.processOperator(operator, operands);
    }
} {code}
The resulting output:
{code:java}
TL:[COSInt{0}]
q:[]
BT:[]
rg:[COSInt{1}, COSInt{0}, COSInt{0}]
i:[COSInt{0}]
Tf:[COSName{TT0}, COSInt{20}]
Tc:[COSInt{0}]
Tw:[COSInt{0}]
Ts:[COSInt{0}]
Tz:[COSInt{100}]
Tr:[COSInt{0}]
TD:[COSInt{0}, COSFloat{-15.791}]
TL:[COSFloat{15.791}]
Td:[COSInt{0}, COSFloat{-15.791}]
Tj:[COSString{HOODHD035236}]
ET:[]
Q:[] {code}
These operators do not exist in the original stream:
{code:java}
TL:[COSFloat{15.791}]
Td:[COSInt{0}, COSFloat{-15.791}]{code}
If you were to re-write the stream given the operators from the engine, it 
causes display issues in the resulting PDF.

I'm attaching a test case which demonstrates the issue. 

 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to