Re: How to detect the encoding of a string?

Bruno Haible Fri, 03 Jun 2005 14:08:36 -0700

Abel Cheung wrote:
> >    (because there are very few
> >    meaningful strings which look like UTF-8 but aren't).
>
> Yes, that's rare, though real world case has really happened before,
> especially for multibyte characters. Here is a sample:
>
> http://qa.mandrakesoft.com/show_bug.cgi?id=3935


Yes. It's a heuristic, and heuristics are always buggy. The programmer has
to weigh the benefit for the many users for which it "just works" against
the problem that it will cause for a few ones. In this case, when the
heuristic doesn't work, the result will be a filename that is garbage, and
a different garbage than if no heuristic took place.

Bruno


--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Re: How to detect the encoding of a string?

Reply via email to