Re: Strange request that English language only be entered in a text ar ea.

Paul Hastings Wed, 29 Sep 2004 18:38:53 -0700

if you're using unicode (or don't mind converting the text to unicode) you
could determine the unicode block(s) of the text to guess the "ballpark"
language tossing out anything with a majority of text that's not "basic
latin" see http://www.sustainablegis.com/unicode/testUBlocks.cfm for an
example. after that you could simply see if there are any non-english chars
(well practically anything past \u007E). its no where near fool proof but it
is free & easy.

if you want near certainty then you'd need something like xerox's language
guesser:

http://www.xrce.xerox.com/competencies/content-analysis/tools/guesser-ISO-8859-1.en.html

or its unicode cousin:

http://www.xrce.xerox.com/competencies/content-analysis/tools/guesser.en.html

[Todays Threads] [This Message] [Subscription] [Fast Unsubscribe] [User Settings] [Donations and Support]

Re: Strange request that English language only be entered in a text ar ea.

Reply via email to