On Tuesday, September 24, 2002, at 04:26 PM, Jordi Mas wrote:

>       <Barbarism
>           word="tamany"
>             suggestion1="mida"
>             suggestion2="grand�ria"
>       />

For what it's worth, Jordi and I have reached a good consensus on this. 
I've made a suggestion to him regarding the XML grammar. It should look 
something like this instead:

<barbarism word="tamany">
        <suggestion word="mida" />
        <suggestion word="grand�ria" />
        <suggestion ... />
</barbarism>

Doing this will allow for a easily growable suggestion list and simpler 
import logic.

> * Known problems in the design
>
> - We work at word level, not sentence level. We are just hacking a 
> spell checker

I'll work on an interface and implementation for this which we can use 
later. It will necessarily resemble:

Iterator Document::getParagraphIterator()

Iterator ParagraphIterator::getSentenceIterator()
string   ParagraphIterator::getTarget()

Iterator SentenceIterator::getWordIterator()
string   WordIterator::getTarget()

Once we have a reasonably working sentence iterator, we can start 
hooking up grammar checkers. Once we have a sentence iterator, we'll 
have a word iterator that might help clean up the massive amount of 
garbage in our current spelling queuing code.

> - Words that can be declined have to be coded several times (plurals, 
> verbs declinations, etc). At least in Catalan, this is not very > common.


On a related note, we may want to implement a multimap (1->Many) 
structure to use here for efficiency concerns. We could probably get 
away with using a UT_Map or UT_StringPtrMap here. The target would be a 
UT_Vector containing UT_UTF8String pointers.

string barbarism -> string suggestion1, string suggestion2, ...

Cheers,
Dom

Reply via email to