There are several separate issues here, with distinct fixes. None of the issues
I can identify (there may be others) require that the user mess with the
document encoding in LyX, at least if it's something "typical", such as UTF8.

Issue #1: Ligatures.  PDF documents typically replace certain character pairs
(notably "fi") with special glyphs (ligatures) to reduce horizontal space and
improve appearance.  With "fi", the "i" gets tucked in under the overhang of the
"f".  (Note that "overhang" is not a professional typographer's term, as far as
I know.)  The special glyph is encoded as one byte, and I don't of any document
encoding that would let you paste that one byte (copied into the clipboard from,
say, Acrobat Reader) and have it automatically expand back into the two
characters "fi".  That said, if you open a PDF file in Evince (rather than AR)
and copy text including a ligature, then paste it into LyX (or a text editor),
Evince converts the ligature to the original two characters. I've tested this on
Linux. Evince is available for Windows, and I think (hope) it works there as
well. Also, there are tools for extracting text from PDF files, such as
pdftotext, that will convert the ligatures back to source characters.

Issue #2: Quotation marks.  For PDFs, this is a repeat of issue #1: copied from
Acrobat Reader, opening quotes are a non-ASCII byte and closing quotes paste as
a typewriter-style closing quote (not what you want in a document).  Evince
correctly converts them in the clipboard, and I think pdftotext will work (or at
least get you within a global search/replace inside LyX).  I'm not sure about
Word's smart quotations, since I don't use Word.  You might try
http://dan.hersam.com/tools/smart-quotes.html.

Issue #3: Em-dashes and other long dashes.  Depending on how long the dash is,
sometimes they seem to copy correctly from AR and sometimes not (another odd
byte-long code). Evince seems to do better.

So, bottom line, I'd introduce the students to either Evince or pdftotext --
Evince is probably easier -- and suggest they use that for copying from PDF and
pasting into LyX.

Paul

Reply via email to