Re: farsi language auto-detection in web pages

2005-08-10 Thread Behdad Esfahbod
On Wed, 10 Aug 2005, mohsen ali momeni wrote: Hi, Thanks for reply, What I exatly need is CP1256 detection, and after that detecting whether the language is persian or not. As you can guess, all non-Unicode character sets share the same 8-bit space, so they overlap all the time. Your only

Re: farsi language auto-detection in web pages

2005-08-09 Thread Paul Hastings
mohsen ali momeni wrote: How can I auto-detect language of a webpage without knowing it's charset? (suppose language and charset is not defined in header) kind of a lousy way to do things but ... Is there a simple (not time-consuming) method to detect a page charset?

Re: farsi language auto-detection in web pages

2005-08-09 Thread Behdad Esfahbod
On Tue, 9 Aug 2005, mohsen ali momeni wrote: Hi, How can I auto-detect language of a webpage without knowing it's charset? (suppose language and charset is not defined in header) Is there a simple (not time-consuming) method to detect a page charset? If it's UTF-8 or UTF-16, kinda easy, not