Hi! I have a perl script that parses RSS streams from different news sources and experience problems with national characters in a regexp function used for matching a keyword list with the RSS data.
Everything works fine with a simple regexp for plain english i.e. words containing the letters A-Z, a-z, 0-9. if ( $description =~ m/\b$key/i ) {….} Keywords or RSS data with national characters don’t work at all. I’m not really surprised this was expected as character sets used in the different RSS streams are outside my control. I am have the ”use utf8;” function activated but I’m not really sure if it is needed. I can’t see any difference used or not. If a convert all the national characters used in the keyword list to html type ”å” and so on. Changes every occurrence of octal, unicode characters used i.e. decimal and hex to html type in the RSS data in a character parser everything works fine but takes time that I don’t what to avoid. Do you have suggestions on this character issue? Is it possible to determine the character set of a text efficiently? Is it other ways to solve the problem? /Christer -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/