https://bugs.documentfoundation.org/show_bug.cgi?id=104195
László Németh <[email protected]> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |[email protected]
--- Comment #3 from László Németh <[email protected]> ---
Command line Hunspell word tokenization differs from the LibreOffice break
iterator. Hunspell in LibreOffice can handle such combined Unicode characters
well, you only need to use UTF-8 encoded aff and dic files:
------ gug.aff ------
SET UTF-8
.....
# for suggestions with correct combined diacritics:
MAP 2
MAP aá
MAP g(g̃)
------- gug.dic -----
100000
ág̃a
(If both precomposed and combined diacritics are common for the given language,
you need the canonical form
See also Hunspell 4 manual, for example:
Use parenthesized groups for character sequences (eg. for composed Uni‐
code characters):
MAP 3
MAP ß(ss) (character sequence)
MAP fi(fi) ("fi" compatibility characters for Unicode fi
ligature)
MAP (ọ́)o (composed Unicode character: ó with bottom dot)
--
You are receiving this mail because:
You are the assignee for the bug._______________________________________________
Libreoffice-bugs mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs