Hi Ivan,

From the logical point of view, whenever you have:

REP AB BA
REP BA AB

you should rather have

MAP AB

because this is what you mean: treat both characters as belonging to the same 
class (as representing the same sound or something like that). So you still 
have some redundancy in your REP rules (REP ng ngh and REP ngh ng are OK 
because they have different length).

I suggest thinking of creating more context dependent REP rules by simply 
looking at the patterns in spelling mistakes. This isn't quite trivial but 
sometimes it's worth the effort if suggestions aren't yet right. Yet, it seems 
to me that you should get good suggestions right now :)

Regards,
Marcin


Dnia 2 lipca 2008 6:38 Iván García <[EMAIL PROTECTED]> napisał(a):

> Thanks Marcin, I've made used of the MAP sections and get rid of many 
> REP rules. Currently the vietnamese .aff rules are as shown below, do 
> you think that I still have some unnecessary rules?
> 
> SET UTF-8
> TRY esianrtolcdugmphbyfvkwzESIANRTOLCDUGMPHBYFVKWZ'-
> 
> MAP 14
> MAP aàảãáạăằẳẵắặâầẩẫấậ
> MAP AÀẢÃÁẠĂẰẲẴẮẶÂẦẨẪẤẬ
> MAP dđ
> MAP DĐ
> MAP eèẻẽéẹêềểễếệ
> MAP EÈẺẼÉẸÊỀỂỄẾỆ
> MAP iìỉĩíị
> MAP IÌỈĨÍỊ
> MAP oòỏõóọôồổỗốộơờởỡớợ
> MAP OÒỎÕÓỌÔỒỔỖỐỘƠỜỞỠỚỢ
> MAP uùủũúụưừửữứự
> MAP UÙỦŨÚỤƯỪỬỮỨỰ
> MAP yỳỷỹýỵ
> MAP YỲỶỸÝỴ
> 
> REP 32
> REP óa oá
> REP óe oé
> REP úy uý
> REP òa oà
> REP òe oè
> REP ùy uỳ
> REP õa oã
> REP õe oẽ
> REP ũy uỹ
> REP ỏa oả
> REP ỏe oẻ
> REP ủy uỷ
> REP ọa oạ
> REP ọe oẹ
> REP ụy uỵ
> REP uo ườ
> REP uo ướ
> REP uo ưỡ
> REP uo ưở
> REP uo ượ
> REP ch tr
> REP d gi
> REP dz d
> REP f ph
> REP g gh
> REP gh g
> REP gi d
> REP ng ngh
> REP ngh ng
> REP s x
> REP tr ch
> REP x s
> 
> 
> Many thanks.
> Ivan Garcia.
> 
> 
> Marcin Miłkowski wrote:
> > Iván García pisze:
> >> Currently in our Vietnamese hunspell dictionary (for firefox and 
> >> Openoffice), if we misspell đường as đừong , we get three suggestions:
> >>
> >> "đừ ong" (adding space, 1 operation)
> >> "đong" (removing ừ , 1 operation)
> >> "đừng" (removing o , 1 operation)
> >>
> >> actually we'd like the system to propose us "đường" also, which 
> >> implies 2 replacements operations:
> >> - replacing ừ -> ư and o -> ờ , is there any way to find out what is 
> >> the max number of operations hunspell does? How to configure that in 
> >> the .aff file?
> >
> > Two things might help you. First, create a MAP section where you say 
> > that all accented versions of u are equivalent to u while searching 
> > for suggestions.
> >
> > For Polish, I have set it like that:
> >
> > MAP 8
> > MAP aą
> > MAP cć
> > MAP eę
> > MAP lł
> > MAP nń
> > MAP oóu
> > MAP sś
> > MAP zżź
> >
> > And then look for REP section in .aff file. For example, in Polish, we 
> > could have:
> >
> > REP 1
> > REP ł eu
> >
> > This would mean: if you find a misspelled word, try to substitute "ł" 
> > with "eu" to see if you get a dictionary word. This is helpful when 
> > you need to replace a single character with a sequence of characters.
> >
> > Regards
> > Marcin
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to