On Fri, 2005-08-26 at 16:07 +0300, Rich wrote:
> i have some questions regarding development of dictionaries for oo.org.
> i'm sorry if some of them are silly, but i have no knowledge of internal
> structure of these tools and processes involving their development.
> 
> 1. a) would it make sense to split dictionary by functionality (for
> example, base, computer terms, human names etc) ?

Would make sense. Other (further) categorisations can also work, for
example to also split by word type (noun, verb, etc) and usage
frequency.

> b) what are the benefits and drawbacks of such an approach ?

The modularity allows for topic / word type based searches. To use an
English example: if you want to know if all verbs have their past tense
forms, one only need to search the verb files. If you work in a munched
(compressed) format, it also makes it easier to see if similar words are
in the same affix categories as you expect.

If you get to do grammar checking later, you will probably want word
type splits. 

> c) if a dictionary is to be split this way, does hyphenator component
> also have to be split accordingly ?
> 

No. The hyphenation doesn't work with word lists. One can generate the
hyphenation file from word lists, but this is not the only way. If one
does indeed generate it from word lists, modularity probably won't hurt,
but is not necessary. One can always combine different lists, but it is
much harder to separate them:-)


> 3. at the page http://lingucomponent.openoffice.org/, there is text :
> 
> "MySpell is used to support spell checking in OpenOffice.org 1.x. It is 
> planned to replace MySpell with hunspell, which builds on MySpell but 
> supports Unicode and adds several other useful features."
> 
> what is the current status of spellcheck component ? are there still 
> plans to replace it ? will replacing invalidate existing dictionaries ?
> 

As already mentioned, hunspell works with myspell dictionaries, no
worries there. If your target users will still be using "older" versions
of OpenOffice.org or other projects that use myspell (Thunderbird,
spellbound) you might want to check what these projects' plans are for
supporting hunspell before spending too much time on hunspell-only
features. 

> 4. there must be other things that are important to achieve this goal -
> there probably have been cases that we could learn from (both positive
> and negative). what are major obstacles and common mistakes ? what
> important principles must be considered ?
> 
> thanks

The dictionary tools from translate.org.za might help you. They help to
package the dictionaries for several spell projects (ispell, aspell,
openoffice, thunderbird, etc.) and contain a few other niceties. 

Good luck!


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to