On Thu, 14 May 2026 at 14:13, Timofei Zhakov <[email protected]> wrote:

> This function counts real printable UTF characters in a string. It
> currently contains a table of all patterns that is manually checked. I
> believe it was stolen from elsewhere a long time ago. Before we had
> utf8proc as a required dependency.
>
> I have a few reasons to rewrite it to use the library instead;
>
> 1. I'm pretty sure nobody would ever care to update the dataset. On
> the other hand, utf8proc bundles all available information about the
> latest Unicode version that is supported on the current platform.
>
> 2. There is also a property that defines *display* width, that
> basically makes symbols like emojis wider than normal characters even
> on monospace fonts.
>
> (For context I want to fix indentation in places throughout our
> cmdline like the authors in 'svn list -v' that mess up the tables.
> This is where a function like that will be useful.)
>
> 3. Cleanup redundant code.
>
> 4. It might be slightly faster to use their dataset because utf8proc
> only accesses a table in static memory twice (for address and then
> retrieves properties) instead of binary searching and checking all
> ranges. Maybe it's slower though idk.
>
> Thoughts?
>
> Sounds good to me.

Regarding potential performance regression: is it something we can measure?
As far as I understand svn_utf_cstring_utf8_width() is not used for
performance critical code, but it would be nice to know if there is
significant performance regression anyway.

-- 
Ivan Zhakov

Reply via email to