Dan Caprioara created FOP-2701:
----------------------------------

             Summary: Some of the latin ligatures make text not searchable in 
PDF
                 Key: FOP-2701
                 URL: https://issues.apache.org/jira/browse/FOP-2701
             Project: FOP
          Issue Type: Bug
          Components: font/opentype
    Affects Versions: 2.1
         Environment: Windows 10, Calibri font.
            Reporter: Dan Caprioara
         Attachments: latn-ligatures-Antenna-House.pdf, latn-ligatures-FOP.pdf

This problem happens using the Calibri font, that is packed in the MS Office 
suite and Windows 10.

I tested with the following text: {{file settings}}. 
The resulted PDF text contains ligatures: {{(fi)le se(tti)ngs}}

Searching for {{file}} in Acrobat Reader results in the first word being 
selected. This is Ok. But searching for {{set}}, or {{settings}} gives no 
results. 

The same example, run with Antenna House works fine, you get results when 
searching for {{settings}}.

Here is the complete FO file:

{code:xml}
<?xml version="1.0" encoding="UTF-8"?>
<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format";>
    <fo:layout-master-set>
        <fo:simple-page-master master-name="a">
            <fo:region-body/>
        </fo:simple-page-master>
    </fo:layout-master-set>
    <fo:page-sequence master-reference="a">
        <fo:flow flow-name="xsl-region-body">
            <fo:block font-family="Calibri" font-size="40pt">file 
settings</fo:block>
        </fo:flow>
    </fo:page-sequence>
</fo:root>
{code}

Some considerations:
# A workaround would be to reject all the substitutions that are not part of 
org.apache.fop.fonts.type1.AdobeStandardEncoding. This would leave the (fi) 
ligature, but reject the (tti) one. But this seems to work only for Calibri and 
not for Roboto!!
# I think there might be some issues with the font embedding, and some 
substitution mapping data is lost. It is just a guess, I am not sure how PDF 
deals with substitutions.

I know that setting in FO xml:lang to "en" disables the ligatures, but is not a 
solution for my project. I would appreciate any suggestions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to