> However it still seems to me, that implementing international
localisations might not be that easy.
> "adding my words to the dictionary" sounds a bit scary to me, considering
> that czech language (in my case) has about the same number of words as
> english, not mentioning that it uses different character set.
> Please correct me if I am wrong. How exactly do you consider it to be
> working?

My idea is to provide a small (~4000 words) english dictionary. [The list
would come from the search engine on AdaIC that I built; it is the most
common words on about 50,000 web pages of material on the Ada programming
language. Not a perfect source, but adequate.] The filter would have a
series of sensitivity settings for Trash/Spam/Delete. In inital use,
Delete/Spam would be off.

Initially, a lot of messages would be trashed because of unknown words.
(Even for English, because domain-specific words would get caught.) You'd
use the TF Viewer (think modern Spameye) to process the messages. If you
decide to relay a message, the Viewer would automatically analyze its word
use, and propose a list of words to add to the dictionary. Each word would
have a checkbox; you'd simply check the ones you want and click OK. (The
reason for proposing a list is so that words that are misspelled don't get
into the dictionary. Even good mail has misspelled words.)

You'd initially turn on the filter when you had time to watch the messages
carefully (so that important mail didn't get delayed too long). But it
wouldn't take long for most of the common words in your mail stream to be in
the dictionary, and less and less good mail would get filtered. After a few
days, you could turn on more aggressive filters to catch the real junk (and
turn up the sensitivity).

Not having tried it, I'm not certain if that will actually work in practice.
One possible problem is that this filter would assume that the letters are
as those in Latin-1. I don't know if Latin-2 has the letters and
case-equivalence in the same place; if so, it still work (even if the
characters aren't quite the same) - only matters that a 16#C8# is a capital
letter, for example. There's no doubt that it would work best for those
getting messages in a single language, but even of the setting wasn't very
aggressive, it would still catch a lot of spam full of gibberish.

                   Randy.

This is the discussion list for the IMS Free email server software.
  To unsubscribe send mailto:[EMAIL PROTECTED]

            Delivered by Rockliffe MailSite
           http://www.rockliffe.com/mailsite
                Rock Solid Software (tm)

Reply via email to