KalleOlaviNiemitalo commented on code in PR #1798:
URL: https://github.com/apache/avro/pull/1798#discussion_r934770067
##########
lang/c/src/schema.c:
##########
@@ -48,26 +51,50 @@ static void avro_schema_init(avro_schema_t schema,
avro_type_t type)
static int is_avro_id(const char *name)
{
- size_t i, len;
if (name) {
- len = strlen(name);
- if (len < 1) {
- return 0;
- }
- for (i = 0; i < len; i++) {
- if (!(isalpha(name[i])
- || name[i] == '_' || (i && isdigit(name[i])))) {
+ size_t len = strlen(name);
+ if (len < 1) {
+ return 0;
+ }
+
+ locale_t loc = newlocale(LC_ALL_MASK, "en_US.UTF-8", (locale_t) 0);
+ locale_t currentLoc = (locale_t) 0;
+ if (loc) {
+ currentLoc = uselocale(loc);
+ }
+ else {
+ setlocale(LC_ALL, "en_US.UTF-8");
+ }
+
+ size_t mbslen = mbstowcs(NULL, name, 0);
+ wchar_t wsName[mbslen + 1];
+ mbstowcs(wsName, name, mbslen + 1);
Review Comment:
<https://unicode-org.github.io/icu/userguide/strings/properties.html#enumerated-property-over-string>
mentions "UTF-8 macros" that would apparently let you look up properties of a
UTF-8 encoded character without first recoding to wchar_t. If you could use
that and rely solely on ICU rather than set up a locale, then the code would be
more easily portable to Windows.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]