[
https://issues.apache.org/jira/browse/PDFBOX-3061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr updated PDFBOX-3061:
------------------------------------
Description:
Attached file is reduced from govdocs file 092465.pdf.
Text extraction with 1.8:
{code}
day. Some market watchers were
{code}
Text extraction with 2.0:
{code}
day. Somemarketwatcherswere
{code}
Text extraction with Adobe Reader:
{code}
day. Somemarket watchers were
{code}
PrintTextLocations 1.8:
{code}
String[36.0,169.67963 fs=1.0 xscale=10.4999 height=6.3524394 space=3.4298992
width=6.4154396]d
String[42.41544,169.67963 fs=1.0 xscale=10.4999 height=6.3524394
space=3.4298992 width=5.2499504]a
String[47.66539,169.67963 fs=1.0 xscale=10.4999 height=6.3524394
space=3.4298992 width=5.837944]y
String[53.503334,169.67963 fs=1.0 xscale=10.4999 height=6.3524394
space=3.4298992 width=2.6249733].
String[60.01537,169.67963 fs=1.0 xscale=10.4999 height=6.3524394
space=3.4298992 width=5.5124474]S
String[65.52782,169.67963 fs=1.0 xscale=10.4999 height=6.3524394
space=3.4298992 width=5.7329483]o
String[71.260765,169.67963 fs=1.0 xscale=10.4999 height=6.3524394
space=3.4298992 width=9.271416]m
String[80.53218,169.67963 fs=1.0 xscale=10.4999 height=6.3524394
space=3.4298992 width=5.0294495]e
String[87.505165,169.67963 fs=1.0 xscale=10.4999 height=6.3524394
space=3.4298992 width=9.271416]m
String[96.77868,169.67963 fs=1.0 xscale=10.4999 height=6.3524394
space=3.4298992 width=5.2499466]a
String[102.028625,169.67963 fs=1.0 xscale=10.4999 height=6.3524394
space=3.4298992 width=4.147461]r
String[106.17609,169.67963 fs=1.0 xscale=10.4999 height=6.3524394
space=3.4298992 width=5.837944]k
String[112.01403,169.67963 fs=1.0 xscale=10.4999 height=6.3524394
space=3.4298992 width=5.0294495]e
String[117.04348,169.67963 fs=1.0 xscale=10.4999 height=6.3524394
space=3.4298992 width=3.422966]t
String[122.40893,169.67963 fs=1.0 xscale=10.4999 height=6.3524394
space=3.4298992 width=8.75692]w
String[131.16585,169.67963 fs=1.0 xscale=10.4999 height=6.3524394
space=3.4298992 width=5.249954]a
String[136.4158,169.67963 fs=1.0 xscale=10.4999 height=6.3524394
space=3.4298992 width=3.4229736]t
String[139.83878,169.67963 fs=1.0 xscale=10.4999 height=6.3524394
space=3.4298992 width=4.661957]c
String[144.50073,169.67963 fs=1.0 xscale=10.4999 height=6.3524394
space=3.4298992 width=6.1109467]h
String[150.61168,169.67963 fs=1.0 xscale=10.4999 height=6.3524394
space=3.4298992 width=5.0294495]e
String[155.64113,169.67963 fs=1.0 xscale=10.4999 height=6.3524394
space=3.4298992 width=4.147461]r
String[159.78859,169.67963 fs=1.0 xscale=10.4999 height=6.3524394
space=3.4298992 width=4.45195]s
String[166.18617,169.67963 fs=1.0 xscale=10.4999 height=6.3524394
space=3.4298992 width=8.756912]w
String[174.94308,169.67963 fs=1.0 xscale=10.4999 height=6.3524394
space=3.4298992 width=5.0294495]e
String[179.97253,169.67963 fs=1.0 xscale=10.4999 height=6.3524394
space=3.4298992 width=4.147461]r
String[184.12,169.67963 fs=1.0 xscale=10.4999 height=6.3524394 space=3.4298992
width=5.0294495]e
{code}
PrintTextLocations 2.0:
{code}
String[36.0,169.67963 fs=1.0 xscale=10.4999 height=6.4035034 space=5.8666234
width=6.4154396]d
String[42.41544,169.67963 fs=1.0 xscale=10.4999 height=6.4035034
space=5.8666234 width=5.2499504]a
String[47.66539,169.67963 fs=1.0 xscale=10.4999 height=6.4035034
space=5.8666234 width=5.837944]y
String[53.503334,169.67963 fs=1.0 xscale=10.4999 height=6.4035034
space=5.8666234 width=2.6249733].
String[60.01537,169.67963 fs=1.0 xscale=10.4999 height=6.4035034
space=5.8666234 width=5.5124474]S
String[65.52782,169.67963 fs=1.0 xscale=10.4999 height=6.4035034
space=5.8666234 width=5.7329483]o
String[71.260765,169.67963 fs=1.0 xscale=10.4999 height=6.4035034
space=5.8666234 width=9.271416]m
String[80.53218,169.67963 fs=1.0 xscale=10.4999 height=6.4035034
space=5.8666234 width=5.0294495]e
String[87.505165,169.67963 fs=1.0 xscale=10.4999 height=6.4035034
space=5.8666234 width=9.271416]m
String[96.77868,169.67963 fs=1.0 xscale=10.4999 height=6.4035034
space=5.8666234 width=5.2499466]a
String[102.028625,169.67963 fs=1.0 xscale=10.4999 height=6.4035034
space=5.8666234 width=4.147461]r
String[106.17609,169.67963 fs=1.0 xscale=10.4999 height=6.4035034
space=5.8666234 width=5.837944]k
String[112.01403,169.67963 fs=1.0 xscale=10.4999 height=6.4035034
space=5.8666234 width=5.0294495]e
String[117.04348,169.67963 fs=1.0 xscale=10.4999 height=6.4035034
space=5.8666234 width=3.422966]t
String[122.40893,169.67963 fs=1.0 xscale=10.4999 height=6.4035034
space=5.8666234 width=8.75692]w
String[131.16585,169.67963 fs=1.0 xscale=10.4999 height=6.4035034
space=5.8666234 width=5.249954]a
String[136.4158,169.67963 fs=1.0 xscale=10.4999 height=6.4035034
space=5.8666234 width=3.4229736]t
String[139.83878,169.67963 fs=1.0 xscale=10.4999 height=6.4035034
space=5.8666234 width=4.661957]c
String[144.50073,169.67963 fs=1.0 xscale=10.4999 height=6.4035034
space=5.8666234 width=6.1109467]h
String[150.61168,169.67963 fs=1.0 xscale=10.4999 height=6.4035034
space=5.8666234 width=5.0294495]e
String[155.64113,169.67963 fs=1.0 xscale=10.4999 height=6.4035034
space=5.8666234 width=4.147461]r
String[159.78859,169.67963 fs=1.0 xscale=10.4999 height=6.4035034
space=5.8666234 width=4.45195]s
String[166.18617,169.67963 fs=1.0 xscale=10.4999 height=6.4035034
space=5.8666234 width=8.756912]w
String[174.94308,169.67963 fs=1.0 xscale=10.4999 height=6.4035034
space=5.8666234 width=5.0294495]e
String[179.97253,169.67963 fs=1.0 xscale=10.4999 height=6.4035034
space=5.8666234 width=4.147461]r
String[184.12,169.67963 fs=1.0 xscale=10.4999 height=6.4035034 space=5.8666234
width=5.0294495]e
{code}
was:
Attached file is reduced from govdocs file 092465.pdf.
Text extraction with 1.8:
{code}
day. Some market watchers were
{code}
Text extraction with 2.0:
{code}
day. Somemarketwatcherswere
{code}
Text extraction with Adobe Reader:
{code}
day. Somemarket watchers were
{code}
> Word concatenation in 2.0 not in 1.8
> ------------------------------------
>
> Key: PDFBOX-3061
> URL: https://issues.apache.org/jira/browse/PDFBOX-3061
> Project: PDFBox
> Issue Type: Sub-task
> Components: Text extraction
> Affects Versions: 2.0.0
> Reporter: Tilman Hausherr
> Attachments: PDFBOX-3061-092465-reduced.pdf
>
>
> Attached file is reduced from govdocs file 092465.pdf.
> Text extraction with 1.8:
> {code}
> day. Some market watchers were
> {code}
> Text extraction with 2.0:
> {code}
> day. Somemarketwatcherswere
> {code}
> Text extraction with Adobe Reader:
> {code}
> day. Somemarket watchers were
> {code}
> PrintTextLocations 1.8:
> {code}
> String[36.0,169.67963 fs=1.0 xscale=10.4999 height=6.3524394 space=3.4298992
> width=6.4154396]d
> String[42.41544,169.67963 fs=1.0 xscale=10.4999 height=6.3524394
> space=3.4298992 width=5.2499504]a
> String[47.66539,169.67963 fs=1.0 xscale=10.4999 height=6.3524394
> space=3.4298992 width=5.837944]y
> String[53.503334,169.67963 fs=1.0 xscale=10.4999 height=6.3524394
> space=3.4298992 width=2.6249733].
> String[60.01537,169.67963 fs=1.0 xscale=10.4999 height=6.3524394
> space=3.4298992 width=5.5124474]S
> String[65.52782,169.67963 fs=1.0 xscale=10.4999 height=6.3524394
> space=3.4298992 width=5.7329483]o
> String[71.260765,169.67963 fs=1.0 xscale=10.4999 height=6.3524394
> space=3.4298992 width=9.271416]m
> String[80.53218,169.67963 fs=1.0 xscale=10.4999 height=6.3524394
> space=3.4298992 width=5.0294495]e
> String[87.505165,169.67963 fs=1.0 xscale=10.4999 height=6.3524394
> space=3.4298992 width=9.271416]m
> String[96.77868,169.67963 fs=1.0 xscale=10.4999 height=6.3524394
> space=3.4298992 width=5.2499466]a
> String[102.028625,169.67963 fs=1.0 xscale=10.4999 height=6.3524394
> space=3.4298992 width=4.147461]r
> String[106.17609,169.67963 fs=1.0 xscale=10.4999 height=6.3524394
> space=3.4298992 width=5.837944]k
> String[112.01403,169.67963 fs=1.0 xscale=10.4999 height=6.3524394
> space=3.4298992 width=5.0294495]e
> String[117.04348,169.67963 fs=1.0 xscale=10.4999 height=6.3524394
> space=3.4298992 width=3.422966]t
> String[122.40893,169.67963 fs=1.0 xscale=10.4999 height=6.3524394
> space=3.4298992 width=8.75692]w
> String[131.16585,169.67963 fs=1.0 xscale=10.4999 height=6.3524394
> space=3.4298992 width=5.249954]a
> String[136.4158,169.67963 fs=1.0 xscale=10.4999 height=6.3524394
> space=3.4298992 width=3.4229736]t
> String[139.83878,169.67963 fs=1.0 xscale=10.4999 height=6.3524394
> space=3.4298992 width=4.661957]c
> String[144.50073,169.67963 fs=1.0 xscale=10.4999 height=6.3524394
> space=3.4298992 width=6.1109467]h
> String[150.61168,169.67963 fs=1.0 xscale=10.4999 height=6.3524394
> space=3.4298992 width=5.0294495]e
> String[155.64113,169.67963 fs=1.0 xscale=10.4999 height=6.3524394
> space=3.4298992 width=4.147461]r
> String[159.78859,169.67963 fs=1.0 xscale=10.4999 height=6.3524394
> space=3.4298992 width=4.45195]s
> String[166.18617,169.67963 fs=1.0 xscale=10.4999 height=6.3524394
> space=3.4298992 width=8.756912]w
> String[174.94308,169.67963 fs=1.0 xscale=10.4999 height=6.3524394
> space=3.4298992 width=5.0294495]e
> String[179.97253,169.67963 fs=1.0 xscale=10.4999 height=6.3524394
> space=3.4298992 width=4.147461]r
> String[184.12,169.67963 fs=1.0 xscale=10.4999 height=6.3524394
> space=3.4298992 width=5.0294495]e
> {code}
> PrintTextLocations 2.0:
> {code}
> String[36.0,169.67963 fs=1.0 xscale=10.4999 height=6.4035034 space=5.8666234
> width=6.4154396]d
> String[42.41544,169.67963 fs=1.0 xscale=10.4999 height=6.4035034
> space=5.8666234 width=5.2499504]a
> String[47.66539,169.67963 fs=1.0 xscale=10.4999 height=6.4035034
> space=5.8666234 width=5.837944]y
> String[53.503334,169.67963 fs=1.0 xscale=10.4999 height=6.4035034
> space=5.8666234 width=2.6249733].
> String[60.01537,169.67963 fs=1.0 xscale=10.4999 height=6.4035034
> space=5.8666234 width=5.5124474]S
> String[65.52782,169.67963 fs=1.0 xscale=10.4999 height=6.4035034
> space=5.8666234 width=5.7329483]o
> String[71.260765,169.67963 fs=1.0 xscale=10.4999 height=6.4035034
> space=5.8666234 width=9.271416]m
> String[80.53218,169.67963 fs=1.0 xscale=10.4999 height=6.4035034
> space=5.8666234 width=5.0294495]e
> String[87.505165,169.67963 fs=1.0 xscale=10.4999 height=6.4035034
> space=5.8666234 width=9.271416]m
> String[96.77868,169.67963 fs=1.0 xscale=10.4999 height=6.4035034
> space=5.8666234 width=5.2499466]a
> String[102.028625,169.67963 fs=1.0 xscale=10.4999 height=6.4035034
> space=5.8666234 width=4.147461]r
> String[106.17609,169.67963 fs=1.0 xscale=10.4999 height=6.4035034
> space=5.8666234 width=5.837944]k
> String[112.01403,169.67963 fs=1.0 xscale=10.4999 height=6.4035034
> space=5.8666234 width=5.0294495]e
> String[117.04348,169.67963 fs=1.0 xscale=10.4999 height=6.4035034
> space=5.8666234 width=3.422966]t
> String[122.40893,169.67963 fs=1.0 xscale=10.4999 height=6.4035034
> space=5.8666234 width=8.75692]w
> String[131.16585,169.67963 fs=1.0 xscale=10.4999 height=6.4035034
> space=5.8666234 width=5.249954]a
> String[136.4158,169.67963 fs=1.0 xscale=10.4999 height=6.4035034
> space=5.8666234 width=3.4229736]t
> String[139.83878,169.67963 fs=1.0 xscale=10.4999 height=6.4035034
> space=5.8666234 width=4.661957]c
> String[144.50073,169.67963 fs=1.0 xscale=10.4999 height=6.4035034
> space=5.8666234 width=6.1109467]h
> String[150.61168,169.67963 fs=1.0 xscale=10.4999 height=6.4035034
> space=5.8666234 width=5.0294495]e
> String[155.64113,169.67963 fs=1.0 xscale=10.4999 height=6.4035034
> space=5.8666234 width=4.147461]r
> String[159.78859,169.67963 fs=1.0 xscale=10.4999 height=6.4035034
> space=5.8666234 width=4.45195]s
> String[166.18617,169.67963 fs=1.0 xscale=10.4999 height=6.4035034
> space=5.8666234 width=8.756912]w
> String[174.94308,169.67963 fs=1.0 xscale=10.4999 height=6.4035034
> space=5.8666234 width=5.0294495]e
> String[179.97253,169.67963 fs=1.0 xscale=10.4999 height=6.4035034
> space=5.8666234 width=4.147461]r
> String[184.12,169.67963 fs=1.0 xscale=10.4999 height=6.4035034
> space=5.8666234 width=5.0294495]e
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]