Using libutf8proc for svn_utf_cstring_utf8_width()

Timofei Zhakov Thu, 14 May 2026 05:13:48 -0700

This function counts real printable UTF characters in a string. It
currently contains a table of all patterns that is manually checked. I
believe it was stolen from elsewhere a long time ago. Before we had
utf8proc as a required dependency.


I have a few reasons to rewrite it to use the library instead;

1. I'm pretty sure nobody would ever care to update the dataset. On
the other hand, utf8proc bundles all available information about the
latest Unicode version that is supported on the current platform.

2. There is also a property that defines *display* width, that
basically makes symbols like emojis wider than normal characters even
on monospace fonts.

(For context I want to fix indentation in places throughout our
cmdline like the authors in 'svn list -v' that mess up the tables.
This is where a function like that will be useful.)

3. Cleanup redundant code.

4. It might be slightly faster to use their dataset because utf8proc
only accesses a table in static memory twice (for address and then
retrieves properties) instead of binary searching and checking all
ranges. Maybe it's slower though idk.

Thoughts?

-- 
Timofei Zhakov

Using libutf8proc for svn_utf_cstring_utf8_width()

Reply via email to