On Wed, 10 Aug 2005, mohsen ali momeni wrote: > Hi, > Thanks for reply, > What I exatly need is CP1256 detection, and after that detecting > whether the language is persian or not.
As you can guess, all non-Unicode character sets share the same 8-bit space, so they overlap all the time. Your only bet at charset detection is to look at the areas that are left unencoded in each character set and cross-out charsets as use those forbidden areas. As for language detection, that can be used in charset detection too, you can look for the string SPACE REH ALEPH SPACE as a good indicator of Persian. > Regards, --behdad http://behdad.org/ _______________________________________________ PersianComputing mailing list PersianComputing@lists.sharif.edu http://lists.sharif.edu/mailman/listinfo/persiancomputing