Re: [PHP-DEV] IntlCharsetDetector

Stanislav Malyshev Mon, 11 Apr 2016 09:37:23 -0700

Hi!

>> As you say, it doesn't work properly. As a matter of fact, guessing 
>> charsets, like timezones, is not possible. You need to know which 
>> charset something is in. If not, you need to address *that* problem.


It is true that you can not detect charsets with 100% accuracy. It is,
however, also true that many charsets can be distinguished with enough
accuracy to make it useful, especially if you know the set of charsets
you are dealing with. E.g., Russian had about 5 commonly used encodings
before everybody started to use UTF-8, and several exotic ones. Being
able to detect at least the major ones while dealing with a
heterogeneous library of Russian-language texts is a great help. There
may be other cases like this.

The point is even imperfect detection may be useful in certain
circumstances, and detector being part of ICU hints that people find it
useful enough to spend time implementing and supporting it. We should
not ignore that.

-- 
Stas Malyshev
smalys...@gmail.com

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] IntlCharsetDetector

Reply via email to