Hi all, The utf8 egg's unicode-char-sets component is woefully out of date: according to the header comment, the current set definitions were generated in July 2007. Since then, Unicode has been enriched with new characters & with wonderful things like emoji. It's time the sets were updated.
Therefore, I've made a new version of utf8 which generates all character sets at build time, from the official UCD data files. I've attached my low-dependency build script & related files to the following ticket: http://bugs.call-cc.org/ticket/1851 Since the generated sets are sometimes very large, I've also split the unicode-char-sets component into per-set modules, e.g. (unicode-char-sets arabic). The (unicode-char-sets) module re-exports all Unicode character sets, making it backwards-compatible with the old, monolithic (unicode-char-sets). A minor issue which I haven't yet solved is how to compile the generated modules. Currently, the script invoked by custom-build simply runs csc (without custom options) on each module file. This ignores the compiler options that would usually be added by chicken-install, but I'm not sure how to retrieve those options & to invoke the compiler "correctly". Let me know what you think. If these changes are appreciated, I'll work on the case-mapping procedures next. Regards, Wolfgang -- Wolfgang Corcoran-Mathe <w...@sigwinch.xyz>