On Thu, 14 May 2026 at 14:13, Timofei Zhakov <[email protected]> wrote:
> This function counts real printable UTF characters in a string. It > currently contains a table of all patterns that is manually checked. I > believe it was stolen from elsewhere a long time ago. Before we had > utf8proc as a required dependency. > > I have a few reasons to rewrite it to use the library instead; > > 1. I'm pretty sure nobody would ever care to update the dataset. On > the other hand, utf8proc bundles all available information about the > latest Unicode version that is supported on the current platform. > > 2. There is also a property that defines *display* width, that > basically makes symbols like emojis wider than normal characters even > on monospace fonts. > > (For context I want to fix indentation in places throughout our > cmdline like the authors in 'svn list -v' that mess up the tables. > This is where a function like that will be useful.) > > 3. Cleanup redundant code. > > 4. It might be slightly faster to use their dataset because utf8proc > only accesses a table in static memory twice (for address and then > retrieves properties) instead of binary searching and checking all > ranges. Maybe it's slower though idk. > > Thoughts? > > Sounds good to me. Regarding potential performance regression: is it something we can measure? As far as I understand svn_utf_cstring_utf8_width() is not used for performance critical code, but it would be nice to know if there is significant performance regression anyway. -- Ivan Zhakov

