[ 
https://issues.apache.org/jira/browse/PDFBOX-3061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14973394#comment-14973394
 ] 

Andreas Lehmkühler commented on PDFBOX-3061:
--------------------------------------------

It looks like a well know issue here, he calculation of the space width isn't 
accurate. If the space width is reduced to let's say half of the calculated 
value, everything is fine. We have to rethink our assumptions when calculating 
the space width ...

> Word concatenation in 2.0 not in 1.8
> ------------------------------------
>
>                 Key: PDFBOX-3061
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3061
>             Project: PDFBox
>          Issue Type: Sub-task
>          Components: Text extraction
>    Affects Versions: 2.0.0
>            Reporter: Tilman Hausherr
>         Attachments: PDFBOX-3061-092465-reduced.pdf
>
>
> Attached file is reduced from govdocs file 092465.pdf.
> Text extraction with 1.8:
> {code}
> day. Some market watchers were
> {code}
> Text extraction with 2.0:
> {code}
> day. Somemarketwatcherswere
> {code}
> Text extraction with Adobe Reader:
> {code}
> day. Somemarket watchers were
> {code}
> PrintTextLocations 1.8:
> {code}
> String[36.0,169.67963 fs=1.0 xscale=10.4999 height=6.3524394 space=3.4298992 
> width=6.4154396]d
> String[42.41544,169.67963 fs=1.0 xscale=10.4999 height=6.3524394 
> space=3.4298992 width=5.2499504]a
> String[47.66539,169.67963 fs=1.0 xscale=10.4999 height=6.3524394 
> space=3.4298992 width=5.837944]y
> String[53.503334,169.67963 fs=1.0 xscale=10.4999 height=6.3524394 
> space=3.4298992 width=2.6249733].
> String[60.01537,169.67963 fs=1.0 xscale=10.4999 height=6.3524394 
> space=3.4298992 width=5.5124474]S
> String[65.52782,169.67963 fs=1.0 xscale=10.4999 height=6.3524394 
> space=3.4298992 width=5.7329483]o
> String[71.260765,169.67963 fs=1.0 xscale=10.4999 height=6.3524394 
> space=3.4298992 width=9.271416]m
> String[80.53218,169.67963 fs=1.0 xscale=10.4999 height=6.3524394 
> space=3.4298992 width=5.0294495]e
> String[87.505165,169.67963 fs=1.0 xscale=10.4999 height=6.3524394 
> space=3.4298992 width=9.271416]m
> String[96.77868,169.67963 fs=1.0 xscale=10.4999 height=6.3524394 
> space=3.4298992 width=5.2499466]a
> String[102.028625,169.67963 fs=1.0 xscale=10.4999 height=6.3524394 
> space=3.4298992 width=4.147461]r
> String[106.17609,169.67963 fs=1.0 xscale=10.4999 height=6.3524394 
> space=3.4298992 width=5.837944]k
> String[112.01403,169.67963 fs=1.0 xscale=10.4999 height=6.3524394 
> space=3.4298992 width=5.0294495]e
> String[117.04348,169.67963 fs=1.0 xscale=10.4999 height=6.3524394 
> space=3.4298992 width=3.422966]t
> String[122.40893,169.67963 fs=1.0 xscale=10.4999 height=6.3524394 
> space=3.4298992 width=8.75692]w
> String[131.16585,169.67963 fs=1.0 xscale=10.4999 height=6.3524394 
> space=3.4298992 width=5.249954]a
> String[136.4158,169.67963 fs=1.0 xscale=10.4999 height=6.3524394 
> space=3.4298992 width=3.4229736]t
> String[139.83878,169.67963 fs=1.0 xscale=10.4999 height=6.3524394 
> space=3.4298992 width=4.661957]c
> String[144.50073,169.67963 fs=1.0 xscale=10.4999 height=6.3524394 
> space=3.4298992 width=6.1109467]h
> String[150.61168,169.67963 fs=1.0 xscale=10.4999 height=6.3524394 
> space=3.4298992 width=5.0294495]e
> String[155.64113,169.67963 fs=1.0 xscale=10.4999 height=6.3524394 
> space=3.4298992 width=4.147461]r
> String[159.78859,169.67963 fs=1.0 xscale=10.4999 height=6.3524394 
> space=3.4298992 width=4.45195]s
> String[166.18617,169.67963 fs=1.0 xscale=10.4999 height=6.3524394 
> space=3.4298992 width=8.756912]w
> String[174.94308,169.67963 fs=1.0 xscale=10.4999 height=6.3524394 
> space=3.4298992 width=5.0294495]e
> String[179.97253,169.67963 fs=1.0 xscale=10.4999 height=6.3524394 
> space=3.4298992 width=4.147461]r
> String[184.12,169.67963 fs=1.0 xscale=10.4999 height=6.3524394 
> space=3.4298992 width=5.0294495]e
> {code}
> PrintTextLocations 2.0:
> {code}
> String[36.0,169.67963 fs=1.0 xscale=10.4999 height=6.4035034 space=5.8666234 
> width=6.4154396]d
> String[42.41544,169.67963 fs=1.0 xscale=10.4999 height=6.4035034 
> space=5.8666234 width=5.2499504]a
> String[47.66539,169.67963 fs=1.0 xscale=10.4999 height=6.4035034 
> space=5.8666234 width=5.837944]y
> String[53.503334,169.67963 fs=1.0 xscale=10.4999 height=6.4035034 
> space=5.8666234 width=2.6249733].
> String[60.01537,169.67963 fs=1.0 xscale=10.4999 height=6.4035034 
> space=5.8666234 width=5.5124474]S
> String[65.52782,169.67963 fs=1.0 xscale=10.4999 height=6.4035034 
> space=5.8666234 width=5.7329483]o
> String[71.260765,169.67963 fs=1.0 xscale=10.4999 height=6.4035034 
> space=5.8666234 width=9.271416]m
> String[80.53218,169.67963 fs=1.0 xscale=10.4999 height=6.4035034 
> space=5.8666234 width=5.0294495]e
> String[87.505165,169.67963 fs=1.0 xscale=10.4999 height=6.4035034 
> space=5.8666234 width=9.271416]m
> String[96.77868,169.67963 fs=1.0 xscale=10.4999 height=6.4035034 
> space=5.8666234 width=5.2499466]a
> String[102.028625,169.67963 fs=1.0 xscale=10.4999 height=6.4035034 
> space=5.8666234 width=4.147461]r
> String[106.17609,169.67963 fs=1.0 xscale=10.4999 height=6.4035034 
> space=5.8666234 width=5.837944]k
> String[112.01403,169.67963 fs=1.0 xscale=10.4999 height=6.4035034 
> space=5.8666234 width=5.0294495]e
> String[117.04348,169.67963 fs=1.0 xscale=10.4999 height=6.4035034 
> space=5.8666234 width=3.422966]t
> String[122.40893,169.67963 fs=1.0 xscale=10.4999 height=6.4035034 
> space=5.8666234 width=8.75692]w
> String[131.16585,169.67963 fs=1.0 xscale=10.4999 height=6.4035034 
> space=5.8666234 width=5.249954]a
> String[136.4158,169.67963 fs=1.0 xscale=10.4999 height=6.4035034 
> space=5.8666234 width=3.4229736]t
> String[139.83878,169.67963 fs=1.0 xscale=10.4999 height=6.4035034 
> space=5.8666234 width=4.661957]c
> String[144.50073,169.67963 fs=1.0 xscale=10.4999 height=6.4035034 
> space=5.8666234 width=6.1109467]h
> String[150.61168,169.67963 fs=1.0 xscale=10.4999 height=6.4035034 
> space=5.8666234 width=5.0294495]e
> String[155.64113,169.67963 fs=1.0 xscale=10.4999 height=6.4035034 
> space=5.8666234 width=4.147461]r
> String[159.78859,169.67963 fs=1.0 xscale=10.4999 height=6.4035034 
> space=5.8666234 width=4.45195]s
> String[166.18617,169.67963 fs=1.0 xscale=10.4999 height=6.4035034 
> space=5.8666234 width=8.756912]w
> String[174.94308,169.67963 fs=1.0 xscale=10.4999 height=6.4035034 
> space=5.8666234 width=5.0294495]e
> String[179.97253,169.67963 fs=1.0 xscale=10.4999 height=6.4035034 
> space=5.8666234 width=4.147461]r
> String[184.12,169.67963 fs=1.0 xscale=10.4999 height=6.4035034 
> space=5.8666234 width=5.0294495]e
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to