Re: [PATCH] Enable utf8->string to take a range

Maxime Devos Fri, 21 Jan 2022 14:09:02 -0800

Vijay Marupudi schreef op vr 21-01-2022 om 15:20 [-0500]:
+  (pass-if-exception "utf8->string range: end < start"
+      exception:out-of-range
+      (let* ((utf8 (string->utf8 "gnu guile")))
+        (utf8->string utf8 1 0)))
+  [other tests]


It would be nice to check multibyte characters as well,
to verify that byte indices and not character indices are used.

E.g., (utf8->string #vu8(195 169) 0 2) should return "é".

Another nice test: (utf8->string #vu8(195 169) 0 1) should raise
a 'decoding-error', even though #vu8(195 169) is valid UTF-8.

And (utf8->string #vu8(0 32 196) 0 2) should return "\x00 " even
though #vu8(0 32 195) is invalid UTF-8 -- and as a bonus, it checks
that the nul character is supported -- which can be easily forgotten
because Guile is implemented in C which usually terminates strings
by zero instead of using a length field.

Overall, the patch you sent seems a reasonable approach to me, though
I didn't verify the details.  I find myself at times copying a part
of a bytevector to a new bytevector because some procedure doesn't
allow specifying byte ranges ...

Greetings,
Maxime

signature.asc
Description: This is a digitally signed message part

Re: [PATCH] Enable utf8->string to take a range

Reply via email to