[
https://issues.apache.org/jira/browse/PDFBOX-5433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17536914#comment-17536914
]
Mike Cantrell commented on PDFBOX-5433:
---------------------------------------
Thanks for the clarification. I guess I'm still confused as to why
MoveTextSetLeading would call the context with the other two operators instead
of the single operator that was encountered. With this design, it seems that I
must choose between registering operators for TD or Td but not both or I will
get effective duplicates.
> PDFStreamEngine creating new operators that do not exist in document
> --------------------------------------------------------------------
>
> Key: PDFBOX-5433
> URL: https://issues.apache.org/jira/browse/PDFBOX-5433
> Project: PDFBox
> Issue Type: Bug
> Reporter: Mike Cantrell
> Priority: Major
> Attachments: pdfbox-stream-engine-operators.zip
>
>
> We're using PDFStreamEngine to do some analysis and filtering (optimizations)
> to the document's content streams. I've found an odd case where a form giving
> us extra (unwanted) operators that don't exist in the original stream.
> According to the PDFDebugger, the form's stream has the following contents:
>
> {code:java}
> 0 TL
> q
> BT
> 1 0 0 rg
> 0 i
> /TT0 20 Tf
> 0 Tc
> 0 Tw
> 0 Ts
> 100 Tz
> 0 Tr
> 0 -15.791 TD
> (HOODHD035236) Tj
> ET
> Q{code}
> I created a debug utility to output the operators given by the PDFStreamEngine
> {code:java}
> @Getter
> static class StreamDebugger extends PDFStreamEngine {
> String formName;
> Operator operator;
> List<COSBase> operands;
> int operatorCount;
> public StreamDebugger() {
> addOperator(new BeginText());
> addOperator(new Concatenate());
> addOperator(new DrawObject()); // special text version
> addOperator(new EndText());
> addOperator(new SetGraphicsStateParameters());
> addOperator(new Save());
> addOperator(new Restore());
> addOperator(new NextLine());
> addOperator(new SetCharSpacing());
> addOperator(new MoveText());
> addOperator(new MoveTextSetLeading());
> addOperator(new SetFontAndSize());
> addOperator(new ShowText());
> addOperator(new ShowTextAdjusted());
> addOperator(new SetTextLeading());
> addOperator(new SetMatrix());
> addOperator(new SetTextRenderingMode());
> addOperator(new SetTextRise());
> addOperator(new SetWordSpacing());
> addOperator(new SetTextHorizontalScaling());
> addOperator(new ShowTextLine());
> addOperator(new ShowTextLineAndSpace());
> }
> @Override
> public void showForm(PDFormXObject form) throws IOException {
> this.formName = ((COSName) operands.get(0)).getName();
> super.showForm(form);
> this.formName = null;
> }
> @Override
> protected void processOperator(Operator operator, List<COSBase> operands)
> throws IOException {
> this.operator = operator;
> this.operands = operands;
> if (Objects.equals(this.formName, "Fm0")) {
> this.operatorCount++;
> System.out.printf("%s:%s%n", operator.getName(),
> operands.toString());
> }
> super.processOperator(operator, operands);
> }
> } {code}
> The resulting output:
> {code:java}
> TL:[COSInt{0}]
> q:[]
> BT:[]
> rg:[COSInt{1}, COSInt{0}, COSInt{0}]
> i:[COSInt{0}]
> Tf:[COSName{TT0}, COSInt{20}]
> Tc:[COSInt{0}]
> Tw:[COSInt{0}]
> Ts:[COSInt{0}]
> Tz:[COSInt{100}]
> Tr:[COSInt{0}]
> TD:[COSInt{0}, COSFloat{-15.791}]
> TL:[COSFloat{15.791}]
> Td:[COSInt{0}, COSFloat{-15.791}]
> Tj:[COSString{HOODHD035236}]
> ET:[]
> Q:[] {code}
> These operators do not exist in the original stream:
> {code:java}
> TL:[COSFloat{15.791}]
> Td:[COSInt{0}, COSFloat{-15.791}]{code}
> If you were to re-write the stream given the operators from the engine, it
> causes display issues in the resulting PDF.
> I'm attaching a test case which demonstrates the issue.
>
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]