Re: [XeTeX] Ligatures and searching in PDFs

Janusz S. Bień Mon, 10 May 2010 00:38:49 -0700

On Mon, 10 May 2010  Paul Foley <p...@mises.com> wrote:

> 1.  (*) text/plain          ( ) text/html           
>
> Try the following:
>
> \documentclass{article}
> \usepackage{xltxtra}
> \setmainfont[Mapping=tex-text,Numbers=OldStyle,Ligatures={Required,Common,Rare}]{Junicode}
>
> \begin{document}
> Fifty afflicted fjords.
> \end{document}
>
> Load the PDF, and search for any of the words.
>
> The "fty", "ct" and "fj" ligatures aren't in Unicode, and the private-use
> characters obviously can't be decomposed by the PDF viewer.  The same
> problem will obviously occur for variant letter shapes, old-style digits,
> etc.
>
> But scanned documents in PDF often have an invisible text layer attached
> which can be searched, etc.; is it possible to use the same technique to put
> the decomposed letters over the visible private-use characters, so that
> documents remain searchable (and copy/paste-able)?


The proper solution would be to use /ActualText feature of the PDF
specification.

Best regards

Janusz

-- 
                     ,   
dr hab. Janusz S. Bien, prof. UW -  Uniwersytet Warszawski (Katedra Lingwistyki 
Formalnej)
Prof. Janusz S. Bien - Warsaw University (Department of Formal Linguistics)
jsb...@uw.edu.pl, jsb...@mimuw.edu.pl, http://fleksem.klf.uw.edu.pl/~jsbien/


--------------------------------------------------
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex

Re: [XeTeX] Ligatures and searching in PDFs

Reply via email to