[
https://issues.apache.org/jira/browse/PDFBOX-4480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16794436#comment-16794436
]
ASF subversion and git services commented on PDFBOX-4480:
---------------------------------------------------------
Commit 1855687 from Tilman Hausherr in branch 'pdfbox/branches/2.0'
[ https://svn.apache.org/r1855687 ]
PDFBOX-4480: if ascent and descent are not 0, and if either the height is 0 or
the height is larger than (ascent - descent) / 2, then use that one
> Problem extracting text in newline characters and spaces beetween words
> -----------------------------------------------------------------------
>
> Key: PDFBOX-4480
> URL: https://issues.apache.org/jira/browse/PDFBOX-4480
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Affects Versions: 2.0.13
> Environment: macOs
> Reporter: ANIL SANGHANI
> Priority: Major
> Labels: textextraction
> Attachments: Document.txt, Narasimhan S.pdf
>
>
>
> I have a PDF file , when I try to extract its text using
> It ignores some Enter characters between lines, so the last word in the line
> and the first word in the next line appear as 1 word without spaces between
> them !!
> For Example, In Attached Pdf
> main Bsk as mainBsk
> [[email protected] Bangalore|mailto:[email protected]]
> as [email protected]
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]