https://bugs.documentfoundation.org/show_bug.cgi?id=104195

László Németh <[email protected]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |[email protected]

--- Comment #3 from László Németh <[email protected]> ---
Command line Hunspell word tokenization differs from the LibreOffice break
iterator. Hunspell in LibreOffice can handle such combined Unicode characters
well, you only need to use UTF-8 encoded aff and dic files:

------ gug.aff ------
SET UTF-8 
.....

# for suggestions with correct combined diacritics:

MAP 2
MAP aá
MAP g(g̃)


-------  gug.dic -----
100000
ág̃a

(If both precomposed and combined diacritics are common for the given language,
you need the canonical form 


See also Hunspell 4 manual, for example:

       Use parenthesized groups for character sequences (eg. for composed Uni‐
       code characters):

              MAP 3
              MAP ß(ss)  (character sequence)
              MAP fi(fi)  ("fi" compatibility characters for Unicode fi
ligature)
              MAP (ọ́)o   (composed Unicode character: ó with bottom dot)

-- 
You are receiving this mail because:
You are the assignee for the bug.
_______________________________________________
Libreoffice-bugs mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs

Reply via email to