Hi all, I’ve just released version 2.0.1 of the srfi-130 egg[0], which is my quixotic attempt at a better string library for CHICKEN. It’s a new, fully Unicode-aware, opaque-cursor implementation of John Cowan’s SRFI 130[1] built on top of the utf8[2] egg. Some benefits:
* String cursors, which encapsulate byte offsets, provide faster indexing and substring operations on Unicode strings than codepoint indices. For example, srfi-130’s ‘string-ref/cursor’ runs in (notional) constant time when given a cursor, while utf8’s ‘string-ref’ requires O(n) time. * All srfi-130 procedures that take cursors can also take (codepoint) indices, so porting between srfi-13/srfi-152/utf8 and srfi-130 should be relatively easy. * Cursors are type-safe, and you can only create valid cursors (but see “Caveats” below). Low-level functional programmers may consider this decadent, but I believe it encourages better programming. Passing hand-computed offsets to CHICKEN’s byte-oriented string operations is asking for trouble, and cursors are a more disciplined way to achieve the same goals with similar efficiency. * Better error reporting. The srfi-130 egg tries to provide useful exceptions with correct locations which follow CHICKEN’s internal condition protocol (e.g. type errors raise (exn type) conditions, etc.) This is in contrast to the utf8 egg’s errors, which are often hard to trace (“where exactly did string-ref get that invalid index?”). * More rigorous, randomized testing using the test-generative egg. # Caveats Cursors are very useful, but they don’t play well with string mutation. Mutating a string invalidates all cursors into it, but it’s a hard problem to catch these situations efficiently. It’s also possible to use a cursor on a different string than the one it refers to, which is also an (uncaught) error. This could be averted with an ‘eqv?’ check, if it annoys enough people. In sum, I think that the new srfi-130 egg has some important benefits while mostly maintaining backwards compatibility with srfi-13 and the other CHICKEN string libraries. I hope that some CHICKEN programmers will consider it. Suggestions and patches are welcome. Best regards, Wolf [0] https://wiki.call-cc.org/eggref/5/srfi-130 [1] https://srfi.schemers.org/srfi-130/ [2] https://wiki.call-cc.org/eggref/5/utf8 Thanks to John and to Will Clinger for creating SRFI 130. -- Wolfgang Corcoran-Mathe <[email protected]>
