El 13/08/13 10:30, Marco A.G.Pinto escribió:> Hello! > > Thanks for your suggestion Andrea,
Please, don't duplicate threads. People has a life and may need more than an hour to reply you. If in doubt whether your message has reached the list, either configure to subscription to get your own posts (and then don't use Gmail, as it eats them), or check the list archive: http://mail-archives.apache.org/mod_mbox/openoffice-l10n/201308.mbox/browser > but I haven't been able to find in the archive any file with the > 80 examples. It is right there, inside the /tests folder of the .tar.gz Andrea pointed you to. Each test seems to be made of several files which share the same name but end with different extensions (it is the first time I see them, too): - aff: a sample affixes file for the test - dic: a sample dic file for the test - sug: a file with the list of expected suggestions for Hunspell to offer - test: a Unix shell file to launch the test - wrong: a list of badly written words so Hunspell, using the test dictionary, offers you suggestions (that should match the list in the .sug file). Depending on the test, this file could not exist - good: a list of correctly written words so Hunspell, using the test dictionary and related affixes file, shouldn't complain for incorrect words. Keep in mind that the tests are intended to test Hunspell itself, so if anyone makes changes to the Hunspell source code and wants to be sure his/her changes don't break Hunspell, he/she can run all tests and check that the modified version still passes all the tests. Put it in another way: if you're creating a tool able to parse dic and aff files and derivate all the words, it should pass all these tests. > So, I have edited a word from the en_GB dictionary of Thunderbird > and I am posting here the PFX/SFX codes in the hope that someone can > explain how they work. > > Please see the image here which has basic coding and shows wrong > results: http://i.imgur.com/JEIxOmv.png > > Here is the .AFF code for the word: *abdicate/DNGSn* > Please notice that this word only has suffixes. Please give an > example with prefixes too, if it doesn't give much trouble. > (...) I've learned by example while editing the dictionaries, so I may be wrong. Also, there are some keywords in the aff test files that I don't recognize. I've taken the shortest suffix rule you've posted, so you can get the idea: > *S:* > *SFX S Y 9* SFX -> It is a suffix (PFX would mean a preffix) S -> It is the suffix identifier Y -> Y for Yes. It means that the rule can be cross-used with other prefixes and suffixes. If N, you can't apply this rule together with other affixes the word might have. 9 -> The number of lines related to this rule > SFX S y ies [^aeiou]y SFX -> It is a suffix (PFX would mean a preffix) S -> It is the suffix identifier y -> For a suffix, it is the letter(s) that must be removed from the end of the word. If it were a prefix, the letter(s) would be removed from the beginning of the word. ies -> For a suffix, it is the letter(s) that must be added to the end of the stripped word (once you've applied the previous field stripping). [^aeiou]y -> Condition in regexp notation. In this case, this rule would only be applied to words ending in "y" with the next to last letter is not a, e, i, o or u. In your sample with "abdicate", the last letter is not an "y", so this rule wouldn't apply. > SFX S 0 s [aeiou]y SFX -> It is a suffix (PFX would mean a preffix) S -> It is the suffix identifier 0 -> 0 means that no character should be stripped. s -> In this case, the letter "s" would be added to the word. [aeiou]y -> Condition in regexp notation. In this case, this rule would only be applied to words ending in "y" with the next to last letter IS a, e, i, o or u. > SFX S 0 es [sxz] This rule applies when the word ends in s, x or z, and would add "es" to the main word, without removing any letter. > SFX S 0 es [cs]h > SFX S 0 s [^cs]h Rules for words with this suffix ending in h. > SFX S 0 s [ae]u > SFX S 0 x [ae]u Words with this suffix ending in u with the next to last letter not being e or u can create derivative words adding an s or an x. > SFX S 0 s [^ae]u Words with this suffix ending in u with the next to last letter being e or u can create derivative words adding an s only. > SFX S 0 s [^hsuxyz] If you've reached this far, you should have deduced this rule is the one to be applied to the word "abdicate" (since the word does not end in h, s, u, x, y or z). The rule would not strip any letter and add "s" to create "abdicates". Preffixes work the same, but at the beginning of the words. I guess you know by now that thesaurus and hyphenation each use separate files than aff/dic files. I don't know anything about hyphenation, and I also learned about thesaurus by (broken) examples. Good luck with your software. To be honest, I was thinking of writting a Java program for the aff/dic part, but I'm lazy and I thought the first thing to do would be to port Hunspell to Java, which is a huge task for me (I'd rather not use JNDI). -- Ricardo Palomares (RickieES) Diaspora: https://diasp.eu/u/rickiees Skype: rickie0341971 Jabber: [email protected] -- Ricardo Palomares (RickieES) Diaspora: https://diasp.eu/u/rickiees Skype: rickie0341971 Jabber: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
