On Mon, 10 May 2010 Paul Foley <p...@mises.com> wrote: > 1. (*) text/plain ( ) text/html > > Try the following: > > \documentclass{article} > \usepackage{xltxtra} > \setmainfont[Mapping=tex-text,Numbers=OldStyle,Ligatures={Required,Common,Rare}]{Junicode} > > \begin{document} > Fifty afflicted fjords. > \end{document} > > Load the PDF, and search for any of the words. > > The "fty", "ct" and "fj" ligatures aren't in Unicode, and the private-use > characters obviously can't be decomposed by the PDF viewer. The same > problem will obviously occur for variant letter shapes, old-style digits, > etc. > > But scanned documents in PDF often have an invisible text layer attached > which can be searched, etc.; is it possible to use the same technique to put > the decomposed letters over the visible private-use characters, so that > documents remain searchable (and copy/paste-able)?
The proper solution would be to use /ActualText feature of the PDF specification. Best regards Janusz -- , dr hab. Janusz S. Bien, prof. UW - Uniwersytet Warszawski (Katedra Lingwistyki Formalnej) Prof. Janusz S. Bien - Warsaw University (Department of Formal Linguistics) jsb...@uw.edu.pl, jsb...@mimuw.edu.pl, http://fleksem.klf.uw.edu.pl/~jsbien/ -------------------------------------------------- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex