On Sat, Mar 18, 2023 at 05:59:34PM +0900, Raiki Tamura wrote:
> 2023年3月18日(土) 17:47 Jonathan Wakely <jwakely....@gmail.com>:
> 
> > On Sat, 18 Mar 2023, 08:32 Raiki Tamura via Gcc, <gcc@gcc.gnu.org> wrote:
> >
> >> Thank you everyone for your advice.
> >> Some kinds of names are restricted to unicode alphabetic/numeric in Rust.
> >>
> >
> > Doesn't it use the same rules as C++, based on XID_Start and XID_Continue?
> > That should already be supported.
> >
> 
> Yes, C++ and Rust use the same rules for identifiers (described in UAX#31)
> and we can reuse it in the lexer of gccrs.
> I was talking about values of Rust's crate_name attributes, which only
> allow Unicode alphabetic/numeric characters.
> (Ref:
> https://doc.rust-lang.org/reference/crates-and-source-files.html#the-crate_name-attribute
> )

That is a pretty simple thing, so no need to use an extra library for that.
As is documented in contrib/unicode/README, the Unicode *.txt files are
already checked in and there are several generators of tables.
libcpp/makeucnid.cc already creates tables based on the
UnicodeData.txt DerivedNormalizationProps.txt DerivedCoreProperties.txt
files, including NFC/NKFC, it is true it doesn't currently compute
whether a character is alphanumeric.  That is either Alphabetic
DerivedCoreProperties.txt property, or for numeric Nd, Nl or No category
(3rd column) in UnicodeData.txt.  Should be a few lines to add that support
to libcpp/makeucnid.cc, the only question is if it won't make the ucnranges
array much larger if it differentiates based on another ALPHANUM flag.
If it doesn't grow too much, let's put it there, if it would grow too much,
perhaps we should emit it in a separate table.

        Jakub

Reply via email to