Iván García pisze:
Thanks Marcin, I'll "replace" my REP into MAP when they'll have the same length.

By the way, what is the TRY sentence for?

Read the manual :P

"TRY sets the change characters for suggestions."

It defines characters to be replaced when looking for suggestions, but the order is also important, AFAIR. So if hardly any word in Polish contains "x", it is moved to the end of the string. In hunspell distribution, you'll find some examples of dictionaries for testing.

And do you know where can I find a better hunspell documentation rather than the poor one in sourceforge.

Look at the examples supplied in the source distribution, they usually come with some explanations.

Regards
Marcin

Many thanks.

Marcin Mib3kowski wrote:
Hi Ivan,

From the logical point of view, whenever you have:

REP AB BA
REP BA AB

you should rather have

MAP AB

because this is what you mean: treat both characters as belonging to the same class (as representing the same sound or something like that). So you still have some redundancy in your REP rules (REP ng ngh and REP ngh ng are OK because they have different length).

I suggest thinking of creating more context dependent REP rules by simply looking at the patterns in spelling mistakes. This isn't quite trivial but sometimes it's worth the effort if suggestions aren't yet right. Yet, it seems to me that you should get good suggestions right now :)

Regards,
Marcin


Dnia 2 lipca 2008 6:38 Iván García <[EMAIL PROTECTED]> napisał(a):

Thanks Marcin, I've made used of the MAP sections and get rid of many REP rules. Currently the vietnamese .aff rules are as shown below, do you think that I still have some unnecessary rules?

SET UTF-8
TRY esianrtolcdugmphbyfvkwzESIANRTOLCDUGMPHBYFVKWZ'-

MAP 14
MAP aàảãáạăằẳẵắặâầẩẫấậ
MAP AÀẢÃÁẠĂẰẲẴẮẶÂẦẨẪẤẬ
MAP dđ
MAP DĐ
MAP eèẻẽéẹêềểễếệ
MAP EÈẺẼÉẸÊỀỂỄẾỆ
MAP iìỉĩíị
MAP IÌỈĨÍỊ
MAP oòỏõóọôồổỗốộơờởỡớợ
MAP OÒỎÕÓỌÔỒỔỖỐỘƠỜỞỠỚỢ
MAP uùủũúụưừửữứự
MAP UÙỦŨÚỤƯỪỬỮỨỰ
MAP yỳỷỹýỵ
MAP YỲỶỸÝỴ

REP 32
REP óa oá
REP óe oé
REP úy uý
REP òa oà
REP òe oè
REP ùy uỳ
REP õa oã
REP õe oẽ
REP ũy uỹ
REP ỏa oả
REP ỏe oẻ
REP ủy uỷ
REP ọa oạ
REP ọe oẹ
REP ụy uỵ
REP uo ườ
REP uo ướ
REP uo ưỡ
REP uo ưở
REP uo ượ
REP ch tr
REP d gi
REP dz d
REP f ph
REP g gh
REP gh g
REP gi d
REP ng ngh
REP ngh ng
REP s x
REP tr ch
REP x s


Many thanks.
Ivan Garcia.


Marcin Miłkowski wrote:
Iván García pisze:
Currently in our Vietnamese hunspell dictionary (for firefox and Openoffice), if we misspell đường as đừong , we get three suggestions:

"đừ ong" (adding space, 1 operation)
"đong" (removing ừ , 1 operation)
"đừng" (removing o , 1 operation)

actually we'd like the system to propose us "đường" also, which implies 2 replacements operations: - replacing ừ -> ư and o -> ờ , is there any way to find out what is the max number of operations hunspell does? How to configure that in the .aff file?
Two things might help you. First, create a MAP section where you say that all accented versions of u are equivalent to u while searching for suggestions.

For Polish, I have set it like that:

MAP 8
MAP aą
MAP cć
MAP eę
MAP lł
MAP nń
MAP oóu
MAP sś
MAP zżź

And then look for REP section in .aff file. For example, in Polish, we could have:

REP 1
REP ł eu

This would mean: if you find a misspelled word, try to substitute "ł" with "eu" to see if you get a dictionary word. This is helpful when you need to replace a single character with a sequence of characters.

Regards
Marcin


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


------------------------------------------------------------------------


No virus found in this incoming message.
Checked by AVG. Version: 8.0.134 / Virus Database: 270.4.3/1528 - Release Date: 08-07-01 07:26


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to