[ 
https://issues.apache.org/jira/browse/PDFBOX-5411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520126#comment-17520126
 ] 

Michael Klink commented on PDFBOX-5411:
---------------------------------------

{quote}it could make use of glyph size to disambiguate "easy" cases like this 
one{quote}
In the example disambiguation by the glyph size would result in a better 
output. But there are other cases in which it would result in a worse result, 
e.g. in a poor man's caps/small caps emulation.

Of course, your example also offers slightly different base lines, overlapping 
actual glyph drawings, and different colors as hints. Each hint by itself would 
not suffice, all together probably would.

> PDFTextStripper could use text size in reconstruction
> -----------------------------------------------------
>
>                 Key: PDFBOX-5411
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5411
>             Project: PDFBox
>          Issue Type: Improvement
>          Components: Text extraction
>    Affects Versions: 2.0.25, 3.0.0 PDFBox
>            Reporter: Lapo Luchini
>            Priority: Minor
>         Attachments: image-2022-04-08-16-13-17-334.png, textDoubleText.pdf
>
>
> When two texts are partially overlapping {{PDFTextStripper}} seems to return 
> a mix simply based on "leftmost x coordinate of the glyph", which makes 
> sense, but it could make use of glyph size to disambiguate "easy" cases like 
> this one:
> !image-2022-04-08-16-13-17-334.png!
> currently this is the first parameter of PDFTextStripper.writeString(String 
> string, List<TextPosition> textPositions):
> {{"T0510E09620_S368b3aT92-29fa -4Leef-80I5e-N53c23efE7979f"}}
> I would of course hope for two calls:
> {{"TEST LINE"}}
> {{"051009620_368b3a92-29fa-4eef-805e-53c23ef7979f"}}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to