Paul Eggert <[email protected]> writes:
>> Who gets to identify the look-alikes?
>
> The Unicode Consortium has done this, and as is usual with characters,
> it's complicated. See:
>
> https://www.unicode.org/reports/tr39/#Confusable_Detection
ISTM that trying to incorporate this functionality into grep would be an
endless maintenance chore. Probably better is to have a separate
utility (project) that "canonicalize" each confusable character into one
standard form. Then you can use grep to do the search. If I've got all
the shell constructions right, the one-line form would be:
$ grep "$( canonicalize -opts <<<'pattern' )" \
<(canonicalize -opts file) <(canonicalize -opts file) ...
Since surely there are variations on canonicalization, I've shown
"-opts". Also the "<<<" construction adds a newline at the end, so you
need an option to canonicalize to remove the final newline.
Dale