Many thanks for the replies. Reading the documentation, it looks like it's a bit more complicated than I had hoped.
On the other hand, I realized that for my purpose (removing unwanted hyphens from an OCR'ed document), I don't actually need to match the greek letters, because they occur in two unique formats throughout the whole document (which should match \w- and -\w- ). Thomas ________________________________________ Van: Brian Fraser [frase...@gmail.com] Verzonden: donderdag 29 september 2011 16:59 Aan: John Delacour CC: beginners@perl.org Onderwerp: Re: Matching Greek letters in UTF-8 file On Thu, Sep 29, 2011 at 10:58 AM, John Delacour <johndelac...@gmail.com>wrote: > use encoding 'utf-8'; > > Nitpick: Please don't use this, as encoding is broken. use utf8; and use open qw< :std :encoding(UTF-8) >; should make do for a replacement. To the original poster, please note that there's a bit of a difference in case-insensitive matching (i.e. using /i) -- newer versions of Perl do full casefolding (so \N{GREEK CAPITAL LETTER ALPHA WITH PSILI AND PROSGEGRAMMENI} matches \N{GREEK SMALL LETTER ALPHA WITH PSILI}\N{GREEK SMALL LETTER IOTA}), whereas older versions don't. So if you need to do that, I'd recommend giving the docs a thorough read. Also this: http://98.245.80.27/tcpc/OSCON2011/upr.html -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/