Hello world! I am trying to use unit irregex to match regular expressions in UTF-8 text. Is anyone familiar with a way to ask for the codepoint indices rather than byte indices for the match?
For example: (irregex-match-start-index (irregex-search (irregex "Č" 'utf8) "čččČččč")) returns 6 when I want it to return 3, since there are 3 characters (6 bytes) before my match. I tried (use utf8), but it is documented that it doesn't affect irregex and it sure enough doesn't. I tried using the 'utf8 option while compiling my regex, but it doesn't change the index returned by irregex-match-start-index. Thank you for any ideas you might have!
_______________________________________________ Chicken-users mailing list [email protected] https://lists.nongnu.org/mailman/listinfo/chicken-users
