Hi! >> As you say, it doesn't work properly. As a matter of fact, guessing >> charsets, like timezones, is not possible. You need to know which >> charset something is in. If not, you need to address *that* problem.
It is true that you can not detect charsets with 100% accuracy. It is, however, also true that many charsets can be distinguished with enough accuracy to make it useful, especially if you know the set of charsets you are dealing with. E.g., Russian had about 5 commonly used encodings before everybody started to use UTF-8, and several exotic ones. Being able to detect at least the major ones while dealing with a heterogeneous library of Russian-language texts is a great help. There may be other cases like this. The point is even imperfect detection may be useful in certain circumstances, and detector being part of ICU hints that people find it useful enough to spend time implementing and supporting it. We should not ignore that. -- Stas Malyshev smalys...@gmail.com -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php