[
https://issues.apache.org/jira/browse/PDFBOX-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15053474#comment-15053474
]
John Hewson edited comment on PDFBOX-3062 at 12/11/15 8:00 PM:
---------------------------------------------------------------
To wrap this one up, revisiting my questions above:
{quote}
1) what is the correct thing to do?
2) what should we do for 2.0?
{quote}
The answers would seem to be:
1) visual bounds but only if its not really slow (and we don't know, because we
haven't tried)
2) keep the 1.8 heuristic to be pragmatic
So, we should revisit this later but for 2.0 there's clearly strong support for
keeping the existing heuristic. Issue resolved.
was (Author: jahewson):
To wrap this one up, revisiting my questions above:
{quote}
1) what is the correct thing to do?
2) what should we do for 2.0?
{quote}
The answers would seem to be:
1) visual bounds but only if its not really slow (and we don't know, because we
haven't tried)
2) keep the 1.8 heuristic to be pragmatic
> Text extraction and height different in 2.0
> -------------------------------------------
>
> Key: PDFBOX-3062
> URL: https://issues.apache.org/jira/browse/PDFBOX-3062
> Project: PDFBox
> Issue Type: Sub-task
> Components: Text extraction
> Affects Versions: 2.0.0
> Reporter: Tilman Hausherr
> Assignee: Tilman Hausherr
> Fix For: 2.0.0
>
> Attachments: 005021-reduced.pdf,
> PDFBOX-3062-H6NIYQXHLPGD3GI6SNIYINRAZBCDHUCB-reduced-marked-1.png,
> PDFBOX-3062-H6NIYQXHLPGD3GI6SNIYINRAZBCDHUCB-reduced.pdf,
> PDFBOX-3062-H6NIYQXHLPGD3GI6SNIYINRAZBCDHUCB.pdf,
> PDFBOX-3062-N2MOQ7YZICIYGTPLQJAWJ4HLN6CCEMHZ-reduced.pdf, garbled text 2.pdf
>
>
> AR:
> {code}
> WITH THE increasing complexity of optical modules,
> {code}
> 1.8:
> {code}
> WITH THE increasing complexity of optical modules,
> String[39.6,399.6 fs=1.0 xscale=29.888 height=20.114626 space=7.472
> width=28.214272]W
> String[69.488,386.16 fs=1.0 xscale=9.963 height=6.5955067 space=2.49075
> width=3.3176804]I
> String[72.80568,386.16 fs=1.0 xscale=9.963 height=6.5955067 space=2.49075
> width=6.0873947]T
> String[78.893074,386.16 fs=1.0 xscale=9.963 height=6.5955067 space=2.49075
> width=7.1932907]H
> String[90.71916,386.16 fs=1.0 xscale=9.963 height=6.5955067 space=2.49075
> width=6.0873947]T
> String[96.80656,386.16 fs=1.0 xscale=9.963 height=6.5955067 space=2.49075
> width=7.1932907]H
> {code}
> 2.0:
> {code}
> W
> ITH THE increasing complexity of optical modules,
> String[39.6,399.6 fs=1.0 xscale=29.888 height=9.584274 space=7.472
> width=28.209717]W
> String[69.488,386.16 fs=1.0 xscale=9.963 height=3.194865 space=2.49075
> width=3.3177567]I
> String[72.805756,386.16 fs=1.0 xscale=9.963 height=3.194865 space=2.49075
> width=6.0858]T
> String[78.891556,386.16 fs=1.0 xscale=9.963 height=3.194865 space=2.49075
> width=7.1949615]H
> String[90.719315,386.16 fs=1.0 xscale=9.963 height=3.194865 space=2.49075
> width=6.0858]T
> String[96.805115,386.16 fs=1.0 xscale=9.963 height=3.194865 space=2.49075
> width=7.1949615]H
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]