[
https://issues.apache.org/jira/browse/PDFBOX-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr closed PDFBOX-3370.
-----------------------------------
Resolution: Not A Problem
Assignee: Tilman Hausherr
It's not a bug, it's a feature:
Please open the file in Adobe Reader and try to mark an "l" of the file I
uploaded (or any glyph). You will notice that the rectangle is much larger.
PDFBox has a feature to suppress overlapping identical characters and that one
is on by default. You can switch that off with
stripper.setSuppressDuplicateOverlappingText(false).
{code}
* By default the text stripper will attempt to remove text that overlapps
each other. Word paints the same
* character several times in order to make it look bold. By setting this
to false all text will be extracted, which
* means that certain sections will be duplicated, but better performance
will be noticed.
{code}
If you have control over the generation of the file, use a font that has
correct font metrics (horiAdvance, horiBearingX).
> Error reading the double L
> ---------------------------
>
> Key: PDFBOX-3370
> URL: https://issues.apache.org/jira/browse/PDFBOX-3370
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Affects Versions: 1.8.7, 1.8.8, 1.8.9, 1.8.10, 1.8.11, 1.8.12, 2.0.0, 2.0.1
> Environment: Netbeans 8.1
> Java 7 and Java 8
> Reporter: José Jiménez
> Assignee: Tilman Hausherr
> Priority: Critical
> Labels: extraction, ll, text
> Attachments: PDFBOX-3370-double-l.pdf
>
>
> When trying to read some pdf with words containing LL , the library draws
> only one of the L. Perform a test with the iTextPDF library and working
> properly.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]