https://bugs.documentfoundation.org/show_bug.cgi?id=140382
Jonathan Clark <[email protected]> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |[email protected] --- Comment #3 from Jonathan Clark <[email protected]> --- I investigated this bug as part of my work on bug 46950. Specifically, I wanted to determine if this was an LO-specific issue, or if it originated in an upstream project. The root cause for this bug is incomplete upstream Hebrew dictionary data. Currently, the dictionary doesn't list geresh, gershayim, or the right double quotation mark as word characters. To demonstrate this, I ran the following test directly against hunspell. $ hunspell -d he_IL Hunspell 1.7.2 ג'ירפה * ג’ירפה * ג׳ירפה & ג 15 0: ה, גו, גא, גע, גח, חג, גש, גס, גז, זג, גד, דג, גג, גב, גר * דו"ח * דו”ח * & ח 15 3: כ, חי, אח, קח, חש, שח, חס, חד, חג, גח, חב, נח, חט, טח, צח דו״ח * & ח 15 3: כ, חי, אח, קח, חש, שח, חס, חד, חג, גח, חב, נח, חט, טח, צח This output shows the words containing apostrophe, right single quotation mark, and quotation mark were all interpreted correctly as a single word. However, words containing geresh, right double quotation mark, and gershayim were each incorrectly interpreted as two words. I then edited my local he_IL.aff file to add geresh, right double quotation mark, and gershayim to the WORDCHARS line, and re-ran the above command: $ hunspell -d he_IL <sample.txt Hunspell 1.7.2 ג'ירפה * ג’ירפה * ג׳ירפה & ג׳ירפה 2 0: גירפה, ג'ירפה דו"ח * דו”ח & דו”ח 3 0: דוח, דווח, דו"ח דו״ח & דו״ח 3 0: דוח, דווח, דו"ח With my modified he_IL.aff file, hunspell now correctly sees all cases as a single word (although it says they're spelled incorrectly). Our Hebrew dictionary data comes from an upstream project, Hspell. In order to support these characters properly, I think it would be best to approach the Hspell maintainers with this request. -- You are receiving this mail because: You are the assignee for the bug.
