[Chicken-users] Codepoint indices for matched regexps (UTF-8)?

Henry Hu Fri, 15 Jun 2018 06:45:55 -0700

Hello world!

I am trying to use unit irregex to match regular expressions in UTF-8
text.  Is anyone familiar with a way to ask for the codepoint indices
rather than byte indices for the match?


For example:

(irregex-match-start-index (irregex-search (irregex "Č" 'utf8) "čččČččč"))

returns 6 when I want it to return 3, since there are 3 characters (6
bytes) before my match.

I tried (use utf8), but it is documented that it doesn't affect irregex and
it sure enough doesn't.  I tried using the 'utf8 option while compiling my
regex, but it doesn't change the index returned by
irregex-match-start-index.

Thank you for any ideas you might have!

_______________________________________________
Chicken-users mailing list
[email protected]
https://lists.nongnu.org/mailman/listinfo/chicken-users

[Chicken-users] Codepoint indices for matched regexps (UTF-8)?

Reply via email to