Hello! I'm the current maintainer of the aspell-ro package. And I want to expand it. Please excuse my English. And please excuse my ignorance. If I ask a trivial question just point some link where I can get more details.
First, it happened for me these days to have to spell check a few huge documents. And I was quite impressed by the results of aspell. If aspell knew the misspelled word than that word is in most cases the first option. Never seen such results with other spell checkers. Probably is because most spell checkers are made initially for the English language and Romanian (a latin language) is quite different. Anyway, here's one issue: in Romanian we use the dash (-) about the same way English uses the apostrophe ('). For things like "it is" -> "it's" it is simpler to just make the program learn the short version as well. But we also have times (mostly in the poetic works) when some letters drop and there are more words connected with the dash. Now, for this case I think it would be a good idea if aspell could recognize not the "composed" word but the composing parts (and recongnize the short versions). Is this possible? Also, when a word is partially reproduced (to mimic the live speech) we use to put an apostrophe at that point. Like in the word "ma'am" in American English. Now, when the apostrophe is in the middle it's easy - make aspell learn the new "word". But if the apostrophe is at the beginning or the end of the word things get more complicated. As if the aprostrophe is missing means is a misspell. If it doesn't than it is the case mentioned above. Can I make aspell recognize the aprostrophe? From what I see it interprets it as a punctuation sign (if it's on the beginning or the end of the word) and leaves it outside. How does aspell can learn the words? By word root plus possible prefix and suffix? Or by learning all word forms one by one? I'm no linguist :( So I couldn't compile a complete list for each word. But just wandering if I can just add the possible extensions to the root. It seems clearner somehow. The final issue (even more twisted as the ones above ;-): because of badly implemented Romanian char support many documents are made without the diacritics. So, instead of î we have i and so on. This is a very particular case for a spell checker (I don't know any other language with such an issue) - to add the diacritics. That would be easy if there is only a short list of words. If I add a complete list of words things get more complicated. In this case a word ending in 'a' might means it has the 'the' article (from English). And the exact same word, with the exact same spelling, only that ends in 'ă' not in 'a' means it has the 'a' article (from English). Both words are correct. But in the case of the ASCII text there would be a lot of missed corrections. One hack is to have only the word ending in 'ă' in the dictionary and just ignore the cases where there shouldn't be any change. Is there any way to do this in a nicer way? Or at least to be able to have the full dictionary for the cases where someone has to check a text with diacritics. Also, I didn't find a way to dump the whole word list as I seem to lost the original word list (I have now only the compiled' version). And one bug report (I will add it later today probabil in the sourceforge bug tracker if there isn't another symilar report): if I wrongly type an 'm' after a word instead of a comma ',' than spell check the text, choose replace, type 'word,' instead of 'wordm'... aspell won't leave the punctuation mark outside and try to spell check that as well.
pgp00000.pgp
Description: PGP signature
_______________________________________________ Aspell-devel mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/aspell-devel