[ 
https://issues.apache.org/jira/browse/PDFBOX-2425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17924898#comment-17924898
 ] 

Tilman Hausherr commented on PDFBOX-2425:
-----------------------------------------

Extraction in 2.0.0:
{noformat}
A F r a m e w o r k  for D i s t r i b u t e d  A u t h o r i z a t i o n *
(Extended Abstract)
T h o m a s  Y . C .  W o o  S i m o n  S. L a m
D e p a r t m e n t  of  C o m p u t e r  S c i en ces
T h e  U n i v e r s i t y  of  T e x a s  a t  A u s t i n
A u s t i n ,  T e x a s  78712-1188
1 I n t r o d u c t i o n
{noformat}
While this isn't super perfect, it looks much less messy then initially. The 
cause is the spacing of the source document. I'd say this was fixed in 2.0.0. 
(The label was applied before the release) I'll add it to by regression test 
set.

> Extracted text has extra spaces
> -------------------------------
>
>                 Key: PDFBOX-2425
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2425
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 1.8.7, 1.8.10, 1.8.11, 2.0.0
>            Reporter: John Hewson
>            Priority: Major
>         Attachments: WooLam93c-Visible-p1.pdf, WooLam93c.pdf
>
>
> This is a very old issue, originally from PDFBOX-37. The attached file has 
> extra spaces inserted in the title text by PDFTextStripper.
> {code}
> A Framework  for D i s t r i bu t ed  Au thor i z a t i on*  
> (Extended Abstract) 
> Thoma s  Y .C .  Woo  S imon  S. L am  
> Depa r tmen t  of  Compu t e r  Sc i ences  
> Th e  Un i v e r s i t y  of  T ex a s  a t  Au s t i n  
> Au s t i n ,  T exa s  78712-1188  
> 1 In t r oduc t i on  
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to