Hello,
 
I tried to convert html special characters to their real character. 
For example, converting    ”      to     "   .
 
I had the string 
$str = "“ test ” ניסיון ";
The string contain also Hebrew letters.
 
1. first I did: 
$str =  decode_entities($str);
It convert the special characters okay. 
The problem is that the Hebrew came not okay.
So when I print the value of the $str I get the hebrew as  יסיון
 
2. Then I decided to write a regular expression that change only the
html special characters.
I wrote: 
$str = "“ test ” ניסיון ";
$str =~ s/(&#(?=[0-9])*.{2,5};)/decode_entities($1)/ge; 
Even that it should work only on the matches sub string, it's seem that
it happen also on the Hebrew letters.
The Hebrew letters came again as  יסיון
Part 1 and 2 give the same output.
 
3. I decide to check the regular expression, I remove the 'e' in the
end of the regular expression so I can see the conversion.
I wrote:
$str = "“ test ” ניסיון ";
$str =~ s/(&#(?=[0-9])*.{2,5};)/decode_entities($1)/g; 
The output was:
decode_entities(“) test decode_entities(”) ניסיון 
The Hebrew came out okay, of course.
 
4. I can do :
$str =~ s/“|”/"/g;
Which don't effect the Hebrew, and convert the html characters. 
The problem that there are other html special characters that exist in
the data. 
I would like to do something more generic that will work also for the
future.
Any ideas are welcome!!
Shlomit.

Reply via email to