[Bug 89578] EDITING: spellcheck 2-word phrases where 1 word would be wrong if found outside of the phrase

bugzilla-daemon Thu, 13 Nov 2025 13:05:03 -0800

https://bugs.documentfoundation.org/show_bug.cgi?id=89578


--- Comment #3 from Nick Levinson <[email protected]> ---
It appears that multi-word support is by treating each constituent one-word
string as correctly spelled even though it's wrong or too rare for inclusion
unless adjacent to another word that is itemized for left-adjacency or
right-adjacency (including for phrases that are 3 or more words long).

This applies to legal and medical terminology, place names, business names,
personal names, foreign phrases that have been accepted into English usage
albeit if italicized (recommending italicization to a user might be a separate
feature request), and probably unlimited other categories.

So, now, "York", "Los", "Hampshire", "est", and "Francisco" are accepted, even
though as standalone words in English they're probably very rare, so they
should be marked as wrong by default unless the user wants to allow exceptions.
Rarities are usually omitted from spell-check dictionaries because in a typical
user's context the string is more likely to be a misspelling the user will want
to correct.

Merriam-Webster's Third (approximate title) dictionary, unabridged, says in its
frontmatter that if a word is formed in English as set solid, hyphenated, and
spaced, it is entered into the dictionary with only one form. Usually, the
senses, pronunciations, etymologies, etc. would be the same anyway, and that
saves space, but that means that even that unabridged dictionary is not an
authority for determining whether unlisted forms are uncommon in English.

An introductory book on computers, I think on Linux, said that "file system"
and "filesystem" do not have the same meaning. The only way that occurs to me
to solve that problem in a spell-check would be with a tooltip or similar
display asking the user which meaning is intended.

Back to accepting "York", "Los", etc.: I disagree with that being the solution
to recognizing "New York", "Los Angeles", "New Hampshire", "id est" (the
expansion of "i.e."), and "San Francisco", respectively.

But I also know that designing spell-check to recognize multi-word strings is
harder. My guess is to do multiple passes, with a separate dictionary for each
number of spaces in a string and a pass through the whole document or through
recent edits for strings with the most spaces per string and then repeating
until ending with a pass for spaceless strings. This also needs a way to assign
a string being accepted into a supplemental dictionary into the supplemental
dictionary for the right number of spaces within the string.

It is possible to use one dictionary sorted first by number of spaces and then
by today's sortation method, but for user-editable dictionaries when a user is
trying to find, edit, or add an entry that would be confusing.

How https://bugs.documentfoundation.org/show_bug.cgi?id=154499 indirectly
relates to this I'm not clear, but I think it does.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 89578] EDITING: spellcheck 2-word phrases where 1 word would be wrong if found outside of the phrase

Reply via email to