RE: Matching Greek letters in UTF-8 file

Hamann, T.D. (Thomas) Mon, 10 Oct 2011 04:51:48 -0700

Many thanks for the replies. Reading the documentation, it looks like it's a 
bit more complicated than I had hoped.

On the other hand, I realized that for my purpose (removing unwanted hyphens 
from an OCR'ed document), I don't actually need to match the greek letters, 
because they occur in two unique formats throughout the whole document (which 
should match \w- and -\w- ).

Thomas

________________________________________
Van: Brian Fraser [frase...@gmail.com]
Verzonden: donderdag 29 september 2011 16:59
Aan: John Delacour
CC: beginners@perl.org
Onderwerp: Re: Matching Greek letters in UTF-8 file

On Thu, Sep 29, 2011 at 10:58 AM, John Delacour <johndelac...@gmail.com>wrote:

> use encoding 'utf-8';
>
>

Nitpick: Please don't use this, as encoding is broken. use utf8; and use
open qw< :std :encoding(UTF-8) >; should make do for a replacement.

To the original poster, please note that there's a bit of a difference in
case-insensitive matching (i.e. using /i) -- newer versions of Perl do full
casefolding (so \N{GREEK CAPITAL LETTER ALPHA WITH PSILI AND PROSGEGRAMMENI}
matches \N{GREEK SMALL LETTER ALPHA WITH PSILI}\N{GREEK SMALL LETTER IOTA}),
whereas older versions don't. So if you need to do that, I'd recommend
giving the docs a thorough read. Also this:
http://98.245.80.27/tcpc/OSCON2011/upr.html
--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

RE: Matching Greek letters in UTF-8 file

Reply via email to