gemini-code-assist[bot] commented on code in PR #387:
URL: https://github.com/apache/tvm-ffi/pull/387#discussion_r2666961011
##########
src/ffi/dtype.cc:
##########
@@ -293,19 +306,23 @@ inline DLDataType
StringViewToDLDataType_(std::string_view str) {
scan = str.data();
TVM_FFI_THROW(ValueError) << "unknown dtype `" << str << '`';
}
- char* xdelim; // emulate sscanf("%ux%u", bits, lanes)
- uint8_t bits = static_cast<uint8_t>(strtoul(scan, &xdelim, 10));
+ // Parse bits manually to handle non-null-terminated string_view
+ const char* xdelim = scan;
+ uint8_t bits = static_cast<uint8_t>(parse_digits(xdelim, str_end));
if (bits != 0) dtype.bits = bits;
Review Comment:

The parsed value for `bits` is cast to `uint8_t` without checking for
overflow. A value like `int257` would be silently truncated to `1`, which is
incorrect. Adding a check to ensure the value fits within `uint8_t` would make
the parsing more robust and prevent unexpected behavior.
```c
uint32_t bits_val = parse_digits(xdelim, str_end);
if (bits_val > UINT8_MAX) {
TVM_FFI_THROW(ValueError) << "Bits value " << bits_val << " is out of
range for uint8_t in dtype '"
<< str << "'";
}
uint8_t bits = static_cast<uint8_t>(bits_val);
if (bits != 0) dtype.bits = bits;
```
##########
src/ffi/dtype.cc:
##########
@@ -206,17 +206,30 @@ inline DLDataType
StringViewToDLDataType_(std::string_view str) {
dtype.bits = 32;
dtype.lanes = 1;
const char* scan;
+ const char* str_end = str.data() + str.length();
+
+ // Helper lambda to parse decimal digits from a bounded string_view
+ // Returns the parsed value and updates ptr to point past the last digit
+ auto parse_digits = [](const char*& ptr, const char* end) -> uint32_t {
+ uint32_t value = 0;
+ while (ptr < end && *ptr >= '0' && *ptr <= '9') {
+ value = value * 10 + (*ptr - '0');
+ ptr++;
+ }
+ return value;
+ };
Review Comment:

The `parse_digits` lambda is vulnerable to integer overflow if a parsed
number exceeds `UINT32_MAX`. While the original `strtoul` also had overflow
issues that weren't checked in the previous code, this new implementation can
be made more robust by explicitly detecting and handling overflow. A good
approach is to use a `uint64_t` for accumulation and then check if the value
exceeds `UINT32_MAX` before returning.
```c
auto parse_digits = [](const char*& ptr, const char* end) -> uint32_t {
uint64_t value = 0;
const char* start_ptr = ptr;
while (ptr < end && *ptr >= '0' && *ptr <= '9') {
value = value * 10 + (*ptr - '0');
ptr++;
}
if (value > UINT32_MAX) {
TVM_FFI_THROW(ValueError) << "Integer value in dtype string '"
<< std::string_view(start_ptr, ptr -
start_ptr)
<< "' is out of range for uint32_t";
}
return static_cast<uint32_t>(value);
};
```
##########
src/ffi/dtype.cc:
##########
@@ -293,19 +306,23 @@ inline DLDataType
StringViewToDLDataType_(std::string_view str) {
scan = str.data();
TVM_FFI_THROW(ValueError) << "unknown dtype `" << str << '`';
}
- char* xdelim; // emulate sscanf("%ux%u", bits, lanes)
- uint8_t bits = static_cast<uint8_t>(strtoul(scan, &xdelim, 10));
+ // Parse bits manually to handle non-null-terminated string_view
+ const char* xdelim = scan;
+ uint8_t bits = static_cast<uint8_t>(parse_digits(xdelim, str_end));
if (bits != 0) dtype.bits = bits;
int scalable_multiplier = 1;
- if (strncmp(xdelim, "xvscale", 7) == 0) {
+ // Check bounds before dereferencing xdelim
+ if (xdelim < str_end && strncmp(xdelim, "xvscale", 7) == 0) {
scalable_multiplier = -1;
xdelim += 7;
}
- char* endpt = xdelim;
- if (*xdelim == 'x') {
- dtype.lanes = static_cast<uint16_t>(scalable_multiplier * strtoul(xdelim +
1, &endpt, 10));
+ const char* endpt = xdelim;
+ // Check bounds before dereferencing xdelim
+ if (xdelim < str_end && *xdelim == 'x') {
+ endpt = xdelim + 1;
+ dtype.lanes = static_cast<uint16_t>(scalable_multiplier *
parse_digits(endpt, str_end));
}
Review Comment:

Similar to the `parse_float` case, the number of lanes is not validated
here. An empty or zero value for lanes after `x` should probably be an error.
Also, overflow against `uint16_t` is not checked. This could lead to silent
truncation or misconfiguration.
```c
if (xdelim < str_end && *xdelim == 'x') {
endpt = xdelim + 1;
const char* digits_start = endpt;
uint32_t lanes_val = parse_digits(endpt, str_end);
if (endpt == digits_start || lanes_val == 0) {
TVM_FFI_THROW(ValueError) << "Invalid lanes specification in dtype '"
<< str
<< "'. Lanes must be a positive integer.";
}
if (lanes_val > UINT16_MAX) {
TVM_FFI_THROW(ValueError) << "Lanes value " << lanes_val
<< " is out of range for uint16_t in dtype
'" << str << "'";
}
dtype.lanes = static_cast<uint16_t>(scalable_multiplier * lanes_val);
}
```
##########
src/ffi/dtype.cc:
##########
@@ -206,17 +206,30 @@ inline DLDataType
StringViewToDLDataType_(std::string_view str) {
dtype.bits = 32;
dtype.lanes = 1;
const char* scan;
+ const char* str_end = str.data() + str.length();
+
+ // Helper lambda to parse decimal digits from a bounded string_view
+ // Returns the parsed value and updates ptr to point past the last digit
+ auto parse_digits = [](const char*& ptr, const char* end) -> uint32_t {
+ uint32_t value = 0;
+ while (ptr < end && *ptr >= '0' && *ptr <= '9') {
+ value = value * 10 + (*ptr - '0');
+ ptr++;
+ }
+ return value;
+ };
auto parse_float = [&](const std::string_view& str, int offset, int code,
int bits) {
dtype.code = static_cast<uint8_t>(code);
dtype.bits = static_cast<uint8_t>(bits);
scan = str.data() + offset;
- char* endpt = nullptr;
- if (*scan == 'x') {
- dtype.lanes = static_cast<uint16_t>(strtoul(scan + 1, &endpt, 10));
+ const char* endpt = scan;
+ if (scan < str_end && *scan == 'x') {
+ endpt = scan + 1;
+ dtype.lanes = static_cast<uint16_t>(parse_digits(endpt, str_end));
scan = endpt;
}
Review Comment:

The current implementation doesn't validate the parsed number of lanes. It's
possible to have a string like `float32x` (with no digits after `x`) or
`float32x0`, which would result in `lanes=0`. This is likely incorrect, as the
number of lanes should be a positive integer. Additionally, a value larger than
`UINT16_MAX` would be silently truncated. It would be more robust to validate
the parsed value.
```c
if (scan < str_end && *scan == 'x') {
endpt = scan + 1;
const char* digits_start = endpt;
uint32_t lanes_val = parse_digits(endpt, str_end);
if (endpt == digits_start || lanes_val == 0) {
TVM_FFI_THROW(ValueError) << "Invalid lanes specification in dtype
'" << str
<< "'. Lanes must be a positive integer.";
}
if (lanes_val > UINT16_MAX) {
TVM_FFI_THROW(ValueError) << "Lanes value " << lanes_val
<< " is out of range for uint16_t in dtype
'" << str << "'";
}
dtype.lanes = static_cast<uint16_t>(lanes_val);
scan = endpt;
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]