gemini-code-assist[bot] commented on code in PR #442:
URL: https://github.com/apache/tvm-ffi/pull/442#discussion_r2796310044
##########
src/ffi/extra/json_parser.cc:
##########
@@ -341,7 +341,7 @@ class JSONParserContext {
// copy over the prefix that was already parsed
std::string out_str(start_pos + 1, cur_ - start_pos - 1);
while (cur_ != end_) {
- if (*cur_ < ' ') {
+ if (static_cast<unsigned char>(*cur_) < ' ') {
Review Comment:

For consistency with other parts of this file (e.g., `Peek()` and
`Parse4Hex`), consider using `*reinterpret_cast<const uint8_t*>(cur_)` to treat
the byte as an unsigned value. This would align with the approach used
elsewhere in the parser for handling character bytes.
```c
if (*reinterpret_cast<const uint8_t*>(cur_) < ' ') {
```
##########
src/ffi/extra/json_parser.cc:
##########
@@ -131,7 +131,7 @@ class JSONParserContext {
++cur_;
return true;
}
- if (*cur_ < ' ' || *cur_ == '\\') {
+ if (static_cast<unsigned char>(*cur_) < ' ' || *cur_ == '\\') {
Review Comment:

For consistency with other parts of this file (e.g., `Peek()` and
`Parse4Hex`), consider using `*reinterpret_cast<const uint8_t*>(cur_)` to treat
the byte as an unsigned value. While `static_cast<unsigned char>(*cur_)` is
correct, using a consistent approach for byte interpretation can improve code
clarity and maintainability.
```c
if (*reinterpret_cast<const uint8_t*>(cur_) < ' ' || *cur_ == '\') {
```
##########
tests/cpp/extra/test_json_parser.cc:
##########
@@ -290,6 +290,12 @@ TEST(JSONParser, UnicodeEdgeCases) {
u8"\U0001F600\U0001F601");
}
+TEST(JSONParser, RawUTF8Bytes) {
+ // Regression test: raw UTF-8 bytes (>= 0x80) must not be rejected as
control characters.
+ // This failed when the parser used signed char comparison: *cur_ < ' '
+ EXPECT_EQ(json::Parse("\"\xE4\xB8\xAD\xE6\x96\x87\"").cast<String>(),
"\xE4\xB8\xAD\xE6\x96\x87");
Review Comment:

This regression test is great for verifying the fix. To make it more robust,
consider adding a few more assertions to cover a wider range of scenarios.
Using `u8` string literals can also improve readability while testing the same
underlying byte sequences.
```suggestion
EXPECT_EQ(json::Parse(u8"\"中文\"").cast<String>(), u8"中文");
// Test with a mix of ASCII and non-ASCII characters.
EXPECT_EQ(json::Parse(u8"\"a中b\"").cast<String>(), u8"a中b");
// Test with a two-byte UTF-8 character (cent sign).
EXPECT_EQ(json::Parse(u8"\"¢\"").cast<String>(), u8"¢");
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]