Hi! (By the way, this GSoC project is being discussed in GCC/Rust Zulip: <https://gcc-rust.zulipchat.com/#narrow/stream/327528-GSoC/topic/Unicode.20support>.)
I'm now also putting Mark Wielaard in CC; he once also started discussing this topic, "thinking of importing a couple of gnulib modules to help with UTF-8 processing [unless] other gcc frontends handle [these things] already in a way that might be reusable". See the thread starting at <https://inbox.sourceware.org/gcc/ypqrmbhyu3wrp...@wildebeest.org> "rust frontend and UTF-8/unicode processing/properties". On 2023-03-15T16:18:18+0100, Jakub Jelinek via Gcc <gcc@gcc.gnu.org> wrote: > On Wed, Mar 15, 2023 at 11:00:19AM +0000, Philip Herron via Gcc wrote: >> Excellent work on getting up to speed on the rust front-end. From my >> perspective I am interested to see what the wider GCC community thinks >> about using https://www.gnu.org/software/libunistring/ library within GCC >> instead of rolling our own, this means it will be another dependency on GCC. >> >> The other option is there is already code in the other front-ends to do >> this so in the worst case it should be possible to extract something out of >> them and possibly make this a shared piece of functionality which we can >> mentor you through. > > I don't know what exactly Rust FE needs in this area, but e.g. libcpp > already handles whatever C/C++ need from Unicode support POV and can handle > it without any extra libraries. > So, if we could avoid the extra dependency, it would be certainly better, > unless you really need massive amounts of code from those libraries. > libcpp already e.g. provides mapping of unicode character names to code > points, determining which unicode characters can appear at the start or > in the middle of identifiers, etc. So that's exactly the answer that I supposed you or someone else would give. ;-) That means, GCC/Rust has some investigation to do: whether what libcpp contains is (a) sufficient for its needs, and (b) whether that code can be reused/extracted/refactored in a sensible way, into GCC-level shared source code file, to be used by several front ends (possibly via libcpp). (I suppose GCC/Rust shouldn't link in libcpp directly.) Thanks for the input, all! Grüße Thomas ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955