Thanks Marcin, I'll "replace" my REP into MAP when they'll have the same
length.
By the way, what is the TRY sentence for?
And do you know where can I find a better hunspell documentation rather
than the poor one in sourceforge.
Many thanks.
Marcin Mib3kowski wrote:
Hi Ivan,
From the logical point of view, whenever you have:
REP AB BA
REP BA AB
you should rather have
MAP AB
because this is what you mean: treat both characters as belonging to the same
class (as representing the same sound or something like that). So you still
have some redundancy in your REP rules (REP ng ngh and REP ngh ng are OK
because they have different length).
I suggest thinking of creating more context dependent REP rules by simply
looking at the patterns in spelling mistakes. This isn't quite trivial but
sometimes it's worth the effort if suggestions aren't yet right. Yet, it seems
to me that you should get good suggestions right now :)
Regards,
Marcin
Dnia 2 lipca 2008 6:38 Iván García <[EMAIL PROTECTED]> napisał(a):
Thanks Marcin, I've made used of the MAP sections and get rid of many
REP rules. Currently the vietnamese .aff rules are as shown below, do
you think that I still have some unnecessary rules?
SET UTF-8
TRY esianrtolcdugmphbyfvkwzESIANRTOLCDUGMPHBYFVKWZ'-
MAP 14
MAP aàảãáạăằẳẵắặâầẩẫấậ
MAP AÀẢÃÁẠĂẰẲẴẮẶÂẦẨẪẤẬ
MAP dđ
MAP DĐ
MAP eèẻẽéẹêềểễếệ
MAP EÈẺẼÉẸÊỀỂỄẾỆ
MAP iìỉĩíị
MAP IÌỈĨÍỊ
MAP oòỏõóọôồổỗốộơờởỡớợ
MAP OÒỎÕÓỌÔỒỔỖỐỘƠỜỞỠỚỢ
MAP uùủũúụưừửữứự
MAP UÙỦŨÚỤƯỪỬỮỨỰ
MAP yỳỷỹýỵ
MAP YỲỶỸÝỴ
REP 32
REP óa oá
REP óe oé
REP úy uý
REP òa oà
REP òe oè
REP ùy uỳ
REP õa oã
REP õe oẽ
REP ũy uỹ
REP ỏa oả
REP ỏe oẻ
REP ủy uỷ
REP ọa oạ
REP ọe oẹ
REP ụy uỵ
REP uo ườ
REP uo ướ
REP uo ưỡ
REP uo ưở
REP uo ượ
REP ch tr
REP d gi
REP dz d
REP f ph
REP g gh
REP gh g
REP gi d
REP ng ngh
REP ngh ng
REP s x
REP tr ch
REP x s
Many thanks.
Ivan Garcia.
Marcin Miłkowski wrote:
Iván García pisze:
Currently in our Vietnamese hunspell dictionary (for firefox and
Openoffice), if we misspell đường as đừong , we get three suggestions:
"đừ ong" (adding space, 1 operation)
"đong" (removing ừ , 1 operation)
"đừng" (removing o , 1 operation)
actually we'd like the system to propose us "đường" also, which
implies 2 replacements operations:
- replacing ừ -> ư and o -> ờ , is there any way to find out what is
the max number of operations hunspell does? How to configure that in
the .aff file?
Two things might help you. First, create a MAP section where you say
that all accented versions of u are equivalent to u while searching
for suggestions.
For Polish, I have set it like that:
MAP 8
MAP aą
MAP cć
MAP eę
MAP lł
MAP nń
MAP oóu
MAP sś
MAP zżź
And then look for REP section in .aff file. For example, in Polish, we
could have:
REP 1
REP ł eu
This would mean: if you find a misspelled word, try to substitute "ł"
with "eu" to see if you get a dictionary word. This is helpful when
you need to replace a single character with a sequence of characters.
Regards
Marcin
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]