On Fri, 2005-08-26 at 16:07 +0300, Rich wrote: > i have some questions regarding development of dictionaries for oo.org. > i'm sorry if some of them are silly, but i have no knowledge of internal > structure of these tools and processes involving their development. > > 1. a) would it make sense to split dictionary by functionality (for > example, base, computer terms, human names etc) ?
Would make sense. Other (further) categorisations can also work, for example to also split by word type (noun, verb, etc) and usage frequency. > b) what are the benefits and drawbacks of such an approach ? The modularity allows for topic / word type based searches. To use an English example: if you want to know if all verbs have their past tense forms, one only need to search the verb files. If you work in a munched (compressed) format, it also makes it easier to see if similar words are in the same affix categories as you expect. If you get to do grammar checking later, you will probably want word type splits. > c) if a dictionary is to be split this way, does hyphenator component > also have to be split accordingly ? > No. The hyphenation doesn't work with word lists. One can generate the hyphenation file from word lists, but this is not the only way. If one does indeed generate it from word lists, modularity probably won't hurt, but is not necessary. One can always combine different lists, but it is much harder to separate them:-) > 3. at the page http://lingucomponent.openoffice.org/, there is text : > > "MySpell is used to support spell checking in OpenOffice.org 1.x. It is > planned to replace MySpell with hunspell, which builds on MySpell but > supports Unicode and adds several other useful features." > > what is the current status of spellcheck component ? are there still > plans to replace it ? will replacing invalidate existing dictionaries ? > As already mentioned, hunspell works with myspell dictionaries, no worries there. If your target users will still be using "older" versions of OpenOffice.org or other projects that use myspell (Thunderbird, spellbound) you might want to check what these projects' plans are for supporting hunspell before spending too much time on hunspell-only features. > 4. there must be other things that are important to achieve this goal - > there probably have been cases that we could learn from (both positive > and negative). what are major obstacles and common mistakes ? what > important principles must be considered ? > > thanks The dictionary tools from translate.org.za might help you. They help to package the dictionaries for several spell projects (ispell, aspell, openoffice, thunderbird, etc.) and contain a few other niceties. Good luck! --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
