On Sat, 2002-09-21 at 11:31, Jordi Mas wrote: > Well, I think that we need a solution that marks the misspelled word and > offers a replacement, since that what a user will expect from a spell checker. > I do not think that we can do this using ispell since it just does not work > that way.
One of my questions is "should these words be marked as misspelled at all?" Words that are "incorrect" but are widely in use sound like good candidates for addition into a language, regardless of what the French and Spanish purists/governments think. Languages are fluid, evolving things and borrow heavily from their times, surroundings, other languages and cultures, technology/science, and the people who speak the language. But let's say that these words should really be marked as incorrect, for the sake of argument. If they really aren't allowable words, does this fall under a spell checking problem? I'd argue no. Spell checking solves the problem of mapping: possibly misspelled word->correctly spelled word(s) and not possibly suboptimal/illegal word->better/legal word(s) In my opinion, this looks like a different, but related problem, one related to a language's constructs (i.e. something more closely related to grammar) than to the spelling of its words. If you argue that a misspelled word is "suboptimal" or "illegal" you would be correct in a sense. But here the user's intent was to write a legal/optimal word. In your case, it is the user's intent to write a correctly spelled "suboptimal" word. Should some sort of warning pop up? Maybe, but that could get annoying really fast if I honestly mean to use these words. But I don't want to disable spell checking because I really do want the rest of the words checked. So through all of this, we've proven is that this is possibly a proofing problem, and not a spell checking problem. Read on. > I see this an enhancement to the spell checking system and it most likely will > take under 100 lines code. Any particular reason that makes you think that > this is not appropriated for our main tree? Because I don't see this as a spelling issue and don't believe that it will only take 100 lines to get it right. Consider simply these folowing cases. Please tell me how to fix them without creating a *huge* barbarism file and how to properly identify and handle them in under 100 lines of code: *) Mixed capitalization (ComPutEr) *) Different verb tenses (compute, computed, has computed) *) Pluralization (computes, computers) *) Split infinitives *) The "barbaric" word is misspelled. You'd need to do at least 2 mappings here to get the intended effect: misspelled barbaric->correct barbaric->preferable word Note that this is just what I could think of in 30 seconds, and isn't an exhaustive study of the problem at hand. I see this as a separate service that we could provide in addition to spell checking, but it is certainly not spell checking. > Alan has suggesting that we can implement this as an enhanced custom.dic for > every language. It makes sense to me. What do you think? I don't think that you can achieve this through using a custom.dic for every language, as the custom.dic only has a list of words you mark as "allowable" or "correctly spelled" for a language. It doesn't offer a mapping from wrong->correct word. It doesn't use any algorithm (eg: soundex, visual similarity) to suggest words. To go through this route, in my estimation, would involve writing something nearly equivalent in both size and scope to ispell. > Dom, I think that we should discuss this more a bit to see how good can it be > for other languages, and finally if is not useful we move it to a plugin, but > I think that is a bit early to say "I don't want this in the main tree.". You asked if people had objections. I had one. It seems silly to basically say "I'm looking for objections" and then tell me that "It's too early to object, wait until we discuss more," especially since your message didn't even mention the possibility of discussion. Your email basically said "Here is perceived problem X. Does anyone want to stop me because I'm about to implement something to fix perceived problem X." The logic seems a bit flawed, at least to me... Is this something useful, in my opinion? Maybe/probably. Would I object to it being a plugin? Probably not. Do I still object to it being in the main tree? Yup. Cheers, Dom
