[ https://issues.apache.org/jira/browse/PDFBOX-2425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17924898#comment-17924898 ]
Tilman Hausherr commented on PDFBOX-2425: ----------------------------------------- Extraction in 2.0.0: {noformat} A F r a m e w o r k for D i s t r i b u t e d A u t h o r i z a t i o n * (Extended Abstract) T h o m a s Y . C . W o o S i m o n S. L a m D e p a r t m e n t of C o m p u t e r S c i en ces T h e U n i v e r s i t y of T e x a s a t A u s t i n A u s t i n , T e x a s 78712-1188 1 I n t r o d u c t i o n {noformat} While this isn't super perfect, it looks much less messy then initially. The cause is the spacing of the source document. I'd say this was fixed in 2.0.0. (The label was applied before the release) I'll add it to by regression test set. > Extracted text has extra spaces > ------------------------------- > > Key: PDFBOX-2425 > URL: https://issues.apache.org/jira/browse/PDFBOX-2425 > Project: PDFBox > Issue Type: Bug > Components: Text extraction > Affects Versions: 1.8.7, 1.8.10, 1.8.11, 2.0.0 > Reporter: John Hewson > Priority: Major > Attachments: WooLam93c-Visible-p1.pdf, WooLam93c.pdf > > > This is a very old issue, originally from PDFBOX-37. The attached file has > extra spaces inserted in the title text by PDFTextStripper. > {code} > A Framework for D i s t r i bu t ed Au thor i z a t i on* > (Extended Abstract) > Thoma s Y .C . Woo S imon S. L am > Depa r tmen t of Compu t e r Sc i ences > Th e Un i v e r s i t y of T ex a s a t Au s t i n > Au s t i n , T exa s 78712-1188 > 1 In t r oduc t i on > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org