On Mon, 4 Apr 2016, Sara Golemon wrote: > The subject of character set detection (yes, I know, a hard problem to > solve) came up on SO chat, and Niki noticed that we don't yet wrap the > ICU UCharsetDetector API so I volunteered to put something together. > > https://github.com/php/php-src/compare/master...sgolemon:intl.charsetdetector > > The trouble is, for the WIDE majority of my test cases so far, ICU is > really bad at detecting character sets correctly (as I said, it's a > tough problem). In fact, the ICU manual admits that it doesn't even > look at all of the corpus text, and the "language detection" is a > byproduct not meant for actual language detection. > > Given all that, I'm inclined to reject the idea of rolling this into > PHP for fear of just confusing users without actually adding any > value. > > Thoughts?
I would advice against adding this. As you say, it doesn't work properly. As a matter of fact, guessing charsets, like timezones, is not possible. You need to know which charset something is in. If not, you need to address *that* problem. cheers, Derick -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php