bug#79702: Fw: bug#79702: request: flag for visually identical but different unicode characters

Dale R. Worley Thu, 06 Nov 2025 09:15:35 -0800

Paul Eggert <[email protected]> writes:
>> Who gets to identify the look-alikes?
>
> The Unicode Consortium has done this, and as is usual with characters, 
> it's complicated. See:
>
> https://www.unicode.org/reports/tr39/#Confusable_Detection


ISTM that trying to incorporate this functionality into grep would be an
endless maintenance chore.  Probably better is to have a separate
utility (project) that "canonicalize" each confusable character into one
standard form.  Then you can use grep to do the search.  If I've got all
the shell constructions right, the one-line form would be:

    $ grep "$( canonicalize -opts <<<'pattern' )" \
        <(canonicalize -opts file) <(canonicalize -opts file) ...

Since surely there are variations on canonicalization, I've shown
"-opts".  Also the "<<<" construction adds a newline at the end, so you
need an option to canonicalize to remove the final newline.

Dale

bug#79702: Fw: bug#79702: request: flag for visually identical but different unicode characters

Reply via email to