To comment on the following update, log in, then open the issue:
http://www.openoffice.org/issues/show_bug.cgi?id=92383
                 Issue #|92383
                 Summary|submit new en_US.dic without the errors
               Component|Word processor
                 Version|OOo 2.4.1
                Platform|All
                     URL|
              OS/Version|All
                  Status|UNCONFIRMED
       Status whiteboard|
                Keywords|
              Resolution|
              Issue type|ENHANCEMENT
                Priority|P3
            Subcomponent|programming
             Assigned to|mru
             Reported by|aardvark12





------- Additional comments from [EMAIL PROTECTED] Fri Aug  1 16:08:09 +0000 
2008 -------
I have completed my own en_US.dic for spell checking in Open Office. This was
essential.

There are a large number of errors in the existing Open Office en_US.dic
spelling dictionary. This is not surprising; it seems that someone used
Microsoft Word to check online word lists, such as the 110,000 word list that is
commonly available, and then included the approved results in the Open Office
spelling dictionary. The problem is that that 110,000 word list, which is
advertised as suitable for spell checking, contains about 8,000 errors. Many of
those errors ended up in Microsoft Word itself. Anyone using that word list, or
relying on Microsoft Word to produce error-free dictionaries, is going to end up
with a spell checker that is riddled with errors.

As an example, when I look at Microsoft Word or Open Office (the errors are the
same, except that Open Office has more of them), I see things like airbag [air
bag], airbase [air base], pointblank [point-blank], teabag [tea bag], tealeaves
[tea leaves], sanserif [sans serif], Roobbie [?], slowcoaches [?], antisemitic,
antisemitism [anti-Semitic, anti-Semitism], rightsize, eageyness, or Rafaellle.
Very few English words have 3 L's, yet if you use a simple search on en_US.dic,
you will find other words besides Rafaellle with 3 L's. There is little sense
trying to list every problem. I also have serious difficulties with the word
choices (again, all these problems stem from MS Word). The famous aviator is
Lindbergh, so why put the name Lindberg in a spell checker and create problems
for students? The name of the country is Liechtenstein, so why put Lichtenstein
in a spell checker? (There is a Roy Lichtenstein but, apologies to Roy, nobody
cares. Most people want the name of the country.) I thought that to remove the
garbage from en_US.dic I might have to take out 3,000 or 4,000 words, but the
actual number was much higher.

I have seen published novels that relied on Microsoft Word. They may contained a
dozen or more misspelled words. Every error jerks a reader out of the illusion
created by the writing, and causes the reader to question the writer's
credibility. After about six errors many readers consider discarding a book.
Professionally produced books should not contain any errors--not one.

I have two novels on Amazon.com. I created my own spelling checker in 1993, and
every few years I revised the word list, so I have been at this for a while. In
2001, I released WORDFUN2.ZIP (118,000 word list), which is on simtel.net. That
spell checker contained many words suitable for Scrabble play. I used a
different spell checker for producing books. My spelling checker was updated in
2003 and in 2006, and in May-July 2008 I did a complete check of the word list
against published dictionaries, and integrated words from Open Office en_US.dic.

I typically use http://dictionary.reference.com. Entered words are checked
against the American Heritage Dictionary (my favorite), and the Random House
Unabridged Dictionary (very good), and a Webster's Unabridged (can be
questionable as it is so inclusive) and against WordNet (not to be trusted, as
some of their choices are flat out wrong).

My dictionary is very close in size to the existing en_US.dic used in Open
Office. I would like to offer it as an alternative for writers or business
professionals who don't want to look like idiots. (My dictionary doesn't contain
"alright." Nonprofessionals think it's just fine to use "alright," and will
start screaming and spitting at you that "alright" is a word.  Every writer I
know thinks the usage of that word is a sure indication of illiteracy. The
correct usage is "all right.")

I realize that most people won't care. I used to complain that the online
110,000 English word list, recommended for use in spell checkers, contained
8,000 misspellings. Nobody cared. But professionals do need an accurate word 
list.

The word list is available. I suppose it should be released under some sort of
GNU license. I could not get MUNCH or UNMUNCH for Hunspell (the people
maintaining the dictionary seem to regard it as proprietary), but I did find the
program MySpell, and was able to compile MUNCH and UNMUNCH using Puppy Linux,
then used MySpell MUNCH to compile a dictionary from my word list, then
transferred that dictionary to Windows and used it with Hunspell. I have also
replaced the existing en_US.dic in Open Office with my own version and have been
testing it out. It seems to work fine.

Some work needs to be done on the possessive forms (apostrophe-S). I never used
this with my own spelling checker, but instead parsed the root word. The same is
true of WordPerfect: it looks at the root word, and drops the apostrophe-S. The
reason for this is that ready-made possessive forms can never be accurate. 
English is loaded with words such as gerunds, which serve both as nouns and
verbs (singing, stuffing etc). And there are plenty of words that function both
as a noun and as an adjective. So making a sometimes noun possessive doesn't
keep people from misusing it. Most nouns take an apostrophe-S, even if they end
in S: Charles's tonsils, Jones's leg. But this rule doesn't apply to many
ancient or historical words, so: Moses', Isis', Achilles'. The rule says I
should write "Kansas's wheat fields." But if I write "Kansas's streams," then
there is too much sibilance, so the astute editor will change it to "Kansas'
streams." So neither WordPerfect nor I have ever tried to codify the use of
possessives, since the knowledgeable writer knows it can't be done.

Apostrophes are used for living things, personifications, or words of space,
time, and weight. Also for common phrases like: heart's delight, stone's throw,
and water's edge. Note that "chair's leg" does not fit this criteria. However
the phrase "he fell back into the chair's embrace" seems to pass because chairs
don't embrace, so this might be considered a personification. Most proper nouns
such as Titanic or London can be used as personifications, so the names of
cities, states, countries, rivers, and ships can easily take a possessive. Even
words like Chemistry can take a possessive form: Department of Chemistry's
examines. From this it is clear that many of the possessives that occur in the
current en_US.dic fail to conform to grammatical rules.

So a complete dictionary with possessives will probably take me a few weeks
more, and even then the possessives will be questionable, in much the same way
as the usage in Microsoft Word. 

David Dibble
[EMAIL PROTECTED]

---------------------------------------------------------------------
Please do not reply to this automatically generated notification from
Issue Tracker. Please log onto the website and enter your comments.
http://qa.openoffice.org/issue_handling/project_issues.html#notification

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to