Hello,
I tried to convert html special characters to their real character.
For example, converting ” to " .
I had the string
$str = "“ test ” ניסיון ";
The string contain also Hebrew letters.
1. first I did:
$str = decode_entities($str);
It convert the special characters okay.
The problem is that the Hebrew came not okay.
So when I print the value of the $str I get the hebrew as ×ס×××
2. Then I decided to write a regular expression that change only the
html special characters.
I wrote:
$str = "“ test ” ניסיון ";
$str =~ s/(&#(?=[0-9])*.{2,5};)/decode_entities($1)/ge;
Even that it should work only on the matches sub string, it's seem that
it happen also on the Hebrew letters.
The Hebrew letters came again as ×ס×××
Part 1 and 2 give the same output.
3. I decide to check the regular expression, I remove the 'e' in the
end of the regular expression so I can see the conversion.
I wrote:
$str = "“ test ” ניסיון ";
$str =~ s/(&#(?=[0-9])*.{2,5};)/decode_entities($1)/g;
The output was:
decode_entities(“) test decode_entities(”) ניסיון
The Hebrew came out okay, of course.
4. I can do :
$str =~ s/“|”/"/g;
Which don't effect the Hebrew, and convert the html characters.
The problem that there are other html special characters that exist in
the data.
I would like to do something more generic that will work also for the
future.
Any ideas are welcome!!
Shlomit.