Hi,

The en_US dictionary is looking for a new maintainer. If yuou have time to help and can use the simple munch and unmunch tools and can edit a large wordlist (one word per line) you have the skills to take over the en_US dictionary.

I simply do not have the volunteer time to do it anymore.

Also, I realize the current en_US dictionary is not an OED or unabridged dictionary. It should not be. There are simply too many rarely used words and word variations that actually end up hiding typical mistakes (spelling errors) in more commonly used words.

The key is to try and create a good working set that is common to most people. If more esoteric or domain specific words are needed, then additional dictionaries can be easily created (such as medical, statistical, legal, etc).

So if anyone really wants ownership of en_US and is willing to learn how to maintain it (all it takes is pre-compiled versions of munch and unmunch and a text editor for your machine), I would be happy to pass it along to them and explain how to use the tools.

Kevin

'
On 22-Mar-05, at 3:59 AM, dwb wrote:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Ian,

Ian D. Bollinger wrote:
| What are the criteria for a word being in one of OO's dictionaries?

Once again I can only speak for the en_GB dictionary.

This is a very hard question to answer. To a great extent it comes down to
the views of the editors. From our perspective the objective is to have the
fewest possible words in it consistent with a perceived high rate of valid
word recognition. The reason is simple: the more words one has the greater
the risk that a misspelling will slip through by virtue of being a similar
valid but very rare word, and the larger the number of corrections offered
and the lower the chance that the intended correct spelling will be near
the top of this list. So a list with too many words can also be perceived
as bad.


In trying to assess whether a word in included or not it is necessary to
consider the frequency of its use and also the frequency of use of similar
words that could be mistaken for it.


| Cutpurse, ichor, imperator, nescience, pontifex, spearman, thaumaturgy,
| and viscounty are all in my copy of the Concise Oxford Dictionary
| (fourth edition).


Because a word in in the OED does not mean it should be in the OOo spelling
dictionary. There are in fact many hundreds (perhaps thousands) of words in
the concise OED that are not in the OOo spelling dictionary. And I think
this is correct.


| Also, marquisate is the preferred spelling of marquessate, and rapt
| the preferred spelling of wrapt, according to my dictionary.

Dealing with alternatives is also tricky. In general for en_GB we do not
include alternatives except where there is evidence that the OED is "out of
tune" with common usage. The most well known example of this is many words
ending in "-ise/-ize" such as "organise". In common usage the "-ise" suffix
dominates in en_GB, but "-ize" is preferred by the OED. So we have included
both.


| Finally, doppelgänger appears to be the preferred spelling of
| doppelganger in the current OED, although my dictionary insists it
| is spelled "doppel-gänger" as the tome is half a century old.

"doppelgänger" is in the en_GB dictionary.

Cheers
David.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)

iD8DBQFCP95VgH46HTEjeiURAucAAJ9wNWhUCScvzpAd6XHaAOPnnPU2MgCfQR3g
rAgiT73HfknTuCLmhSM64nI=
=ouDC
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to