To comment on the following update, log in, then open the issue: http://www.openoffice.org/issues/show_bug.cgi?id=92383 Issue #|92383 Summary|submit new en_US.dic without the errors Component|Word processor Version|OOo 2.4.1 Platform|All URL| OS/Version|All Status|UNCONFIRMED Status whiteboard| Keywords| Resolution| Issue type|ENHANCEMENT Priority|P3 Subcomponent|programming Assigned to|mru Reported by|aardvark12
------- Additional comments from [EMAIL PROTECTED] Fri Aug 1 16:08:09 +0000 2008 ------- I have completed my own en_US.dic for spell checking in Open Office. This was essential. There are a large number of errors in the existing Open Office en_US.dic spelling dictionary. This is not surprising; it seems that someone used Microsoft Word to check online word lists, such as the 110,000 word list that is commonly available, and then included the approved results in the Open Office spelling dictionary. The problem is that that 110,000 word list, which is advertised as suitable for spell checking, contains about 8,000 errors. Many of those errors ended up in Microsoft Word itself. Anyone using that word list, or relying on Microsoft Word to produce error-free dictionaries, is going to end up with a spell checker that is riddled with errors. As an example, when I look at Microsoft Word or Open Office (the errors are the same, except that Open Office has more of them), I see things like airbag [air bag], airbase [air base], pointblank [point-blank], teabag [tea bag], tealeaves [tea leaves], sanserif [sans serif], Roobbie [?], slowcoaches [?], antisemitic, antisemitism [anti-Semitic, anti-Semitism], rightsize, eageyness, or Rafaellle. Very few English words have 3 L's, yet if you use a simple search on en_US.dic, you will find other words besides Rafaellle with 3 L's. There is little sense trying to list every problem. I also have serious difficulties with the word choices (again, all these problems stem from MS Word). The famous aviator is Lindbergh, so why put the name Lindberg in a spell checker and create problems for students? The name of the country is Liechtenstein, so why put Lichtenstein in a spell checker? (There is a Roy Lichtenstein but, apologies to Roy, nobody cares. Most people want the name of the country.) I thought that to remove the garbage from en_US.dic I might have to take out 3,000 or 4,000 words, but the actual number was much higher. I have seen published novels that relied on Microsoft Word. They may contained a dozen or more misspelled words. Every error jerks a reader out of the illusion created by the writing, and causes the reader to question the writer's credibility. After about six errors many readers consider discarding a book. Professionally produced books should not contain any errors--not one. I have two novels on Amazon.com. I created my own spelling checker in 1993, and every few years I revised the word list, so I have been at this for a while. In 2001, I released WORDFUN2.ZIP (118,000 word list), which is on simtel.net. That spell checker contained many words suitable for Scrabble play. I used a different spell checker for producing books. My spelling checker was updated in 2003 and in 2006, and in May-July 2008 I did a complete check of the word list against published dictionaries, and integrated words from Open Office en_US.dic. I typically use http://dictionary.reference.com. Entered words are checked against the American Heritage Dictionary (my favorite), and the Random House Unabridged Dictionary (very good), and a Webster's Unabridged (can be questionable as it is so inclusive) and against WordNet (not to be trusted, as some of their choices are flat out wrong). My dictionary is very close in size to the existing en_US.dic used in Open Office. I would like to offer it as an alternative for writers or business professionals who don't want to look like idiots. (My dictionary doesn't contain "alright." Nonprofessionals think it's just fine to use "alright," and will start screaming and spitting at you that "alright" is a word. Every writer I know thinks the usage of that word is a sure indication of illiteracy. The correct usage is "all right.") I realize that most people won't care. I used to complain that the online 110,000 English word list, recommended for use in spell checkers, contained 8,000 misspellings. Nobody cared. But professionals do need an accurate word list. The word list is available. I suppose it should be released under some sort of GNU license. I could not get MUNCH or UNMUNCH for Hunspell (the people maintaining the dictionary seem to regard it as proprietary), but I did find the program MySpell, and was able to compile MUNCH and UNMUNCH using Puppy Linux, then used MySpell MUNCH to compile a dictionary from my word list, then transferred that dictionary to Windows and used it with Hunspell. I have also replaced the existing en_US.dic in Open Office with my own version and have been testing it out. It seems to work fine. Some work needs to be done on the possessive forms (apostrophe-S). I never used this with my own spelling checker, but instead parsed the root word. The same is true of WordPerfect: it looks at the root word, and drops the apostrophe-S. The reason for this is that ready-made possessive forms can never be accurate. English is loaded with words such as gerunds, which serve both as nouns and verbs (singing, stuffing etc). And there are plenty of words that function both as a noun and as an adjective. So making a sometimes noun possessive doesn't keep people from misusing it. Most nouns take an apostrophe-S, even if they end in S: Charles's tonsils, Jones's leg. But this rule doesn't apply to many ancient or historical words, so: Moses', Isis', Achilles'. The rule says I should write "Kansas's wheat fields." But if I write "Kansas's streams," then there is too much sibilance, so the astute editor will change it to "Kansas' streams." So neither WordPerfect nor I have ever tried to codify the use of possessives, since the knowledgeable writer knows it can't be done. Apostrophes are used for living things, personifications, or words of space, time, and weight. Also for common phrases like: heart's delight, stone's throw, and water's edge. Note that "chair's leg" does not fit this criteria. However the phrase "he fell back into the chair's embrace" seems to pass because chairs don't embrace, so this might be considered a personification. Most proper nouns such as Titanic or London can be used as personifications, so the names of cities, states, countries, rivers, and ships can easily take a possessive. Even words like Chemistry can take a possessive form: Department of Chemistry's examines. From this it is clear that many of the possessives that occur in the current en_US.dic fail to conform to grammatical rules. So a complete dictionary with possessives will probably take me a few weeks more, and even then the possessives will be questionable, in much the same way as the usage in Microsoft Word. David Dibble [EMAIL PROTECTED] --------------------------------------------------------------------- Please do not reply to this automatically generated notification from Issue Tracker. Please log onto the website and enter your comments. http://qa.openoffice.org/issue_handling/project_issues.html#notification --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
