Hi!

On Sunday 30 September 2007, Robert Ludvik wrote:
> ...
> In just a few words: people can send words, that are not yet in spell
> check dictionary trough a web form or with a help of a macro, which is
> for now only available for OOo but could be ported to MSO, KOffice(?).
> Relevant people (linguists) would then review sent words and accept
> them for inclusion in dictionaries or reject them.
> Dictionaries are in form that can be used for Mozilla and KOffice
> products as well.
> I'd like to open a discussion about this. If you are interested, you
> can read some more at http://r.aufbix.org/spell/, especially a *draft*
> of proposal how this could be done
> (http://r.aufbix.org/spell/spell-workflow.pdf or
> http://r.aufbix.org/spell/spell-workflow.odg, if you prefer)

I can offer some comments, because our development workflow for Finnish spell 
checker shares some features with your draft and has been in use for about a 
year now.

- We do not have an OOo macro for sending suggestions, but I think it is a 
great idea. We do have a web form [1] though. The form consists of a field to 
enter the word, a drop-down box for selecting the type of the word ("general 
vocabulary", "computing vocabulary", "medical vocabulary", ... , "foreign 
words", "dialects", "words that should be removed from current vocabulary") 
and a free-form text box for explaining the word if it needs an explanation.
The form has not been very popular, on average we get about one word per day 
through it. Could be that we should have advertised it more.

Previously we had a form that only contained a field to enter the word and a 
drop-down box for word class. That one was initially perhaps too popular, it 
was occasionally misused by spamming it with useless strings. We have never 
collected any personal information through these forms. We only track the 
user ip address to limit incoming suggestions to 20 words/ip/day to prevent 
misuse. But some smart person worked around that limitation by using Tor to 
access the form... So I recommend to build the system so that the database 
can be easily cleaned up if something like this happens.

It should be noted that Finland has only a population of 5 million people. And 
the majority of Finnish OOo users (especially on Windows) are still using a 
non-free spell checker (released around 2002) for which our word suggestion 
form is useless. Therefore most language teams could probably expect this 
type of form to be more popular than what we have experienced.

- The review system we use is a lot simpler than the one in your draft. We 
only have one compulsory review step for the suggested words, where a 
registered user of the system either rejects the suggestion or moves it to 
the master database, and populates the new record with necessary meta 
information (inflection class etc.) However, the system maintains a change 
log [2] of all changes made to the master database. Our project has three 
active contributors, and we more or less regularly check each other's changes 
from the log. So in practice there is an extra round of reviews, although it 
is not enforced by the software.

I think that for a small team like ours this simplified review works just 
fine. We do not have any professional linguists in this project anyway. I 
suppose this is the case for many other languages too. So if possible, it 
would be nice to be able to merge the non-linguist and linguist reviews in 
case some teams cannot afford to have both.

- The role of the technician at the end of the process is more or less similar 
in our process and your draft. Only problem we have is that our spell checker 
implementation does not allow merging dictionaries at runtime. This is why 
there is currently no easy way for the users to add medical etc. 
dictionaries, which in turn discourages people from contributing to them. 
This is a technical problem that we must solve later. I believe that Hunspell 
does not have this problem.


Of course the code of our web application is available to any teams who wish 
to use it, since it is under the GPL. The core code has been designed to be 
language independent and the application itself can be localised using po 
files. But it does have a major limitation in that the same database cannot 
be used simultaneously for multiple languages, and technical documentation 
mostly just has not been written. And it is written in Python, not PHP, and 
there is not (yet) export capability for Hunspell format. So I think that 
your proposed workflow, macros and PHP scripts will offer a better initial 
design for solving the dictionary update and maintenance problem for many 
languages.

Harri

[1] http://joukahainen.lokalisointi.org/ehdotasanoja
[2] http://joukahainen.lokalisointi.org/query/listchanges

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to