https://bugs.documentfoundation.org/show_bug.cgi?id=140382

Jonathan Clark <[email protected]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |[email protected]

--- Comment #3 from Jonathan Clark <[email protected]> ---
I investigated this bug as part of my work on bug 46950. Specifically, I wanted
to determine if this was an LO-specific issue, or if it originated in an
upstream project.

The root cause for this bug is incomplete upstream Hebrew dictionary data.
Currently, the dictionary doesn't list geresh, gershayim, or the right double
quotation mark as word characters.

To demonstrate this, I ran the following test directly against hunspell.

 $ hunspell -d he_IL 
 Hunspell 1.7.2
 ג'ירפה
 *
 ג’ירפה
 *
 ג׳ירפה
 & ג 15 0: ה, גו, גא, גע, גח, חג, גש, גס, גז, זג, גד, דג, גג, גב, גר
 *
 דו"ח
 *
 דו”ח
 *
 & ח 15 3: כ, חי, אח, קח, חש, שח, חס, חד, חג, גח, חב, נח, חט, טח, צח
 דו״ח
 *
 & ח 15 3: כ, חי, אח, קח, חש, שח, חס, חד, חג, גח, חב, נח, חט, טח, צח

This output shows the words containing apostrophe, right single quotation mark,
and quotation mark were all interpreted correctly as a single word. However,
words containing geresh, right double quotation mark, and gershayim were each
incorrectly interpreted as two words.

I then edited my local he_IL.aff file to add geresh, right double quotation
mark, and gershayim to the WORDCHARS line, and re-ran the above command:

 $ hunspell -d he_IL <sample.txt 
 Hunspell 1.7.2
 ג'ירפה
 *
 ג’ירפה
 *
 ג׳ירפה
 & ג׳ירפה 2 0: גירפה, ג'ירפה
 דו"ח
 *
 דו”ח
 & דו”ח 3 0: דוח, דווח, דו"ח
 דו״ח
 & דו״ח 3 0: דוח, דווח, דו"ח

With my modified he_IL.aff file, hunspell now correctly sees all cases as a
single word (although it says they're spelled incorrectly).

Our Hebrew dictionary data comes from an upstream project, Hspell. In order to
support these characters properly, I think it would be best to approach the
Hspell maintainers with this request.

-- 
You are receiving this mail because:
You are the assignee for the bug.

Reply via email to