[ https://issues.apache.org/jira/browse/PDFBOX-4846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17116151#comment-17116151 ]
Maruan Sahyoun commented on PDFBOX-4846: ---------------------------------------- >From the Spec: {quote} (PDF 1.5) A marked-content sequence (see 14.6, “Marked Content”), through an ActualText entry in a property list attached to the marked-content sequence with a Span tag. {quote} and further {quote} The ActualText value shall be used as a replacement, not a description, for the content, providing text that is equivalent to what a person would see when viewing the content {quote} > Capital Letter in PDF appear as small letter in PDFBox > ------------------------------------------------------ > > Key: PDFBOX-4846 > URL: https://issues.apache.org/jira/browse/PDFBOX-4846 > Project: PDFBox > Issue Type: Bug > Components: Text extraction > Affects Versions: 2.0.19 > Environment: Windows 7 > Reporter: William Au Yeung > Priority: Major > Attachments: 0493_ltn201903291811.pdf, PDFBOX-4846-p2.pdf, > image-2020-05-25-18-21-56-276.png > > > the wording in page two, "CONSOLIDATED STATEMENT OF PROFIT OR LOSS", > extracted as "CONSOLIDATED STATEMENT OF pROFIT OR LOSS" by pdfbox > the wording in page four, "CONSOLIDATED STATEMENT OF FINANICAL POSITION", > extracted as > "CONSOLIDATED STATEMENT OF FINANICAL pOSITION" by pdfbox > > why have such behaviour? -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org