[ 
https://issues.apache.org/jira/browse/PDFBOX-611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Lehmkühler updated PDFBOX-611:
--------------------------------------

    Fix Version/s:     (was: 0.8.0-incubator)

> PDSimpleFont.  Font height reported as zero.
> --------------------------------------------
>
>                 Key: PDFBOX-611
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-611
>             Project: PDFBox
>          Issue Type: Bug
>          Components: PDModel
>    Affects Versions: 0.8.0-incubator
>         Environment: Win and Linux
>            Reporter: Peter Costello
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The logic for PDSimpleFont.getFontHeight() can return a value of zero.   
> This will corrupt or compromise text extraction and layout.
> In particular, test with 
> 'http://www.encana.com/investor/financial/shareholder/pdfs/info-circular-french.pdf',
>  pg 12 
> When a PDFontDescriptor is used, the current logic uses:
>    1) an average of xHeight and capHeight.   
>              xHeight is the height from the baseline to the top of a lower 
> case letter like 'x'.
>              CapHeight is the height from the baseline to the top of an upper 
> case latin char.
>    2) xHeight
>    3) capHeight
>    4) ascent
>    5) zero
> This is really bizarre.  'xHeight' is an optional parameter, and 'capHeight' 
> is often missing.
> The font bounding box is a required parameter and is the height that is used 
> by Acrobat Reader when you select a line of text.
> The bounding box is not perfect, because it often overlaps the line above, 
> but it is a consistent value.  The problem with the
> current logic is that the reported height varies way too much, and a zero 
> value can be reported.
> I have modified the logic as follows. The goal was to make the nominal values 
> the same as the current logic,
> but return a very similar number when parameters go missing.
>          PDFontDescriptor desc = getFontDescriptor();
>           if( desc != null )  {
>               float height = desc.getCapHeight();                             
> // Top of Cap to baseline (eg 715)
>               if (height==0) {
>                       height=desc.getAscent();                                
>         // Max height from baseline (eg 715);
>                       if (height==0) {
>                               PDRectangle bbox = desc.getFontBoundingBox();
>                               height = bbox.getHeight()/2;                    
> // Max height less max depth (eg (1006-(-325))=1331/2=665)
>                               if (height==0) {
>                                       height=desc.getXHeight();               
>         // Top of lower-case to baseline (eg 518)
>                                       height-=desc.getDescent();              
> // Depth below baseline (eg 209, to get total of 727)
>                               }
>                       }
>               }
>                 retval=height;
>           }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to