On Wed, 10 Aug 2005, mohsen ali momeni wrote:

> Hi,
> Thanks for reply,
> What I exatly need is CP1256 detection, and after that detecting
> whether the language is persian or not.

As you can guess, all non-Unicode character sets share the same
8-bit space, so they overlap all the time.  Your only bet at
charset detection is to look at the areas that are left unencoded
in each character set and cross-out charsets as use those
forbidden areas.  As for language detection, that can be used in
charset detection too, you can look for the string SPACE REH
ALEPH SPACE as a good indicator of Persian.


> Regards,

--behdad
http://behdad.org/
_______________________________________________
PersianComputing mailing list
PersianComputing@lists.sharif.edu
http://lists.sharif.edu/mailman/listinfo/persiancomputing

Reply via email to