[ 
https://issues.apache.org/jira/browse/PDFBOX-2584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14966334#comment-14966334
 ] 

Andreas Lehmkühler edited comment on PDFBOX-2584 at 10/23/15 5:34 PM:
----------------------------------------------------------------------

I've ran a test using 1.8.8. IMHO everything is ok

{code}
String[21.36,184.08002 fs=1.0 xscale=8.004 height=5.326662 space=2.2251122 
width=4.890444]F
String[26.160799,184.08002 fs=1.0 xscale=8.004 height=5.326662 space=2.2251122 
width=5.7788887]C
String[31.918877,184.08002 fs=1.0 xscale=8.004 height=5.326662 space=2.2251122 
width=2.6653328]-
String[34.5682,184.08002 fs=1.0 xscale=8.004 height=5.326662 space=2.2251122 
width=4.450226]0
String[39.002415,184.08002 fs=1.0 xscale=8.004 height=5.326662 space=2.2251122 
width=4.450226]0
String[43.43663,184.08002 fs=1.0 xscale=8.004 height=5.326662 space=2.2251122 
width=4.450226]7
String[120.48,184.08002 fs=1.0 xscale=8.004 height=5.326662 space=2.2251122 
width=4.890442]F
String[125.2824,184.08002 fs=1.0 xscale=8.004 height=5.326662 space=2.2251122 
width=4.450218]L
String[129.71822,184.08002 fs=1.0 xscale=8.004 height=5.326662 space=2.2251122 
width=6.227112]O
String[135.95653,184.08002 fs=1.0 xscale=8.004 height=5.326662 space=2.2251122 
width=7.555771]W
String[143.45467,184.08002 fs=1.0 xscale=8.004 height=5.326662 space=2.2251122 
width=2.225113] 
String[145.67258,184.08002 fs=1.0 xscale=8.004 height=5.326662 space=2.2251122 
width=5.778885]C
String[151.42825,184.08002 fs=1.0 xscale=8.004 height=5.326662 space=2.2251122 
width=6.227112]O
String[157.60654,184.08002 fs=1.0 xscale=8.004 height=5.326662 space=2.2251122 
width=5.778885]N
String[163.29738,184.08002 fs=1.0 xscale=8.004 height=5.326662 space=2.2251122 
width=4.890442]T
String[168.1558,184.08002 fs=1.0 xscale=8.004 height=5.326662 space=2.2251122 
width=5.778885]R
String[173.97151,184.08002 fs=1.0 xscale=8.004 height=5.326662 space=2.2251122 
width=6.227112]O
String[180.08977,184.08002 fs=1.0 xscale=8.004 height=5.326662 space=2.2251122 
width=4.450226]L
String[685.44,195.96002 fs=1.0 xscale=8.004 height=5.326662 space=2.2251122 
width=4.4501953]9
String[689.8822,195.96002 fs=1.0 xscale=8.004 height=5.326662 space=2.2251122 
width=4.4501953]0
String[694.3244,195.96002 fs=1.0 xscale=8.004 height=5.326662 space=2.2251122 
width=4.4501953]5
String[698.70654,195.96002 fs=1.0 xscale=8.004 height=5.326662 space=2.2251122 
width=4.4501953]6
String[703.2088,195.96002 fs=1.0 xscale=8.004 height=5.326662 space=2.2251122 
width=4.4501953]7
String[707.651,195.96002 fs=1.0 xscale=8.004 height=5.326662 space=2.2251122 
width=4.4501953]8
String[624.0,183.96002 fs=1.0 xscale=8.004 height=5.326662 space=2.2251122 
width=5.7788696]N
String[624.0,213.18 fs=1.0 xscale=8.004 height=5.326662 space=2.2251122 
width=5.7788696]N
{code}


was (Author: lehmi):
I didn't have the time to check the 1.8.x branch yesterday. Maybe it's fixed in 
1.8.11 as well? Pavel is complaining about 1.8.8. I'm going to check that later

> Text extraction reports zero character widths 
> ----------------------------------------------
>
>                 Key: PDFBOX-2584
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2584
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 1.8.8
>            Reporter: Pavel Misurkin
>         Attachments: stip_2c.pdf
>
>
> We are using PDFBox API to get position of characters within a document
> Have found a problem with one document:: text extraction properly extracting 
> text but set all character's width to zero
> Code is pretty simple
> {code}
>             File input = new File("stip_2c.pdf");
>             document = PDDocument.load(input);
>             
>             PDFTextStripper extractor = new PDFTextStripper();
>             Writer output = new StringWriter();
>             extractor.writeText(document, output);
> {code}
> We are examining then value of Extractor.charactersByArticle member for 
> characters widths
> - Have found the issue in 1.8.4
> all chars widths were == zero
> - in version 1.8.8
> all chars widths were == zero except whitespaces.
> See new validation added in 1.8.8
> File 
> pdfbox-1.8.8-src\pdfbox\src\main\java\org\apache\pdfbox\util\PDFStreamEngine.java
> line 369
> {code}        if (spaceWidthText == 0)
>         {
>             spaceWidthText = 1.0f; // if could not find font, use a generic 
> value
>         }        {code}
> - in version 2.0.0 problem still exists



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to