On Monday, 20 January 2025 at 00:19:44 UTC, Richard (Rikki)
Andrew Cattermole wrote:
On 24/11/2024 9:53 AM, cookiewitch wrote:
I… don't know? My idea of a "word" here is any unbreakable
unit. I guess I have another round of looking up Unicode
algorithms ahead of me.
Unfortunately word breaking isn't a simple algorithm to
implement.
https://www.unicode.org/reports/tr29/#Word_Boundaries
White space and punctuation alone cannot differentiate words.
Nor can it be used to fake identifiers in a programming
language.
Additionally many words are actually break-
-able because you can hyphenate them to wrap
over multiple lines neatly. Books and news-
-papers employ this technique more
frequently than most other mediums due
to limited page space and printing costs.
However, when space isn’t physically limited (and a book or
newspaper are not being emulated), breaking up words is
unfavourable because it compromises readability. So it’s nice to
have a word breaking option (for when emulating printed media is
desirable, like rendering a newspaper texture that has randomly
generated text), but it should by no means be the default. And
it’s a lot of work for such a subtle and language-specific
feature.