lriggs opened a new issue, #50140:
URL: https://github.com/apache/arrow/issues/50140
### Describe the bug, including details regarding any error messages,
version, and platform.
# [C++][Gandiva] castVARCHAR(decimal128) can corrupt native memory and
return invalid buffers.
## Describe the bug
The Gandiva `castVARCHAR_decimal128_int64` function path can corrupt native
memory and crash the host process (SIGSEGV) when the arena allocation for the
output string fails — for example when a `CAST(decimal AS VARCHAR)` runs
under
memory pressure.
There are three independent problems that combine to produce the crash:
### 1. `castVARCHAR` decimal128 entry is missing `kCanReturnErrors`
In `function_registry_string.cc`, the `castVARCHAR` registry entry for
`decimal128` is registered with only `NativeFunction::kNeedsContext`. Unlike
the
other error-producing cast/string functions, it does **not** set
`NativeFunction::kCanReturnErrors`.
Because of this, generated LLVM code assumes the function can never fail and
skips the post-call error check. Any error the function reports via the
context
is silently ignored, and execution continues with whatever (invalid) buffer
and
length the function returned.
### 2. `gdv_fn_dec_to_string` reports a positive length on allocation failure
In `gdv_function_stubs.cc`, `gdv_fn_dec_to_string` writes the output length
*before* it checks whether the allocation succeeded:
```cpp
*dec_str_len = static_cast<int32_t>(dec_str.length()); // positive length
char* ret = reinterpret_cast<char*>(gdv_fn_context_arena_malloc(context,
*dec_str_len));
if (ret == nullptr) {
// error is set, but *dec_str_len is still positive
return nullptr;
}
```
When the allocation fails, the function returns `nullptr` while
`*dec_str_len`
still holds a positive value. The caller then copies from a null/invalid
buffer
using that positive length, i.e. effectively `memcpy(dst, nullptr,
positive_len)`,
which is undefined behavior and crashes.
### 3. `castVARCHAR_decimal128_int64` does not validate its output length
In `precompiled/decimal_wrapper.cc`, `castVARCHAR_decimal128_int64` computes
the
truncated length and dereferences/returns the buffer from
`gdv_fn_dec_to_string`
without:
- validating that the requested output length (`out_len_param`) is
non-negative, or
- handling the case where the upstream allocation failed.
A negative output length flows straight through into the output length used
by
the copy, which can produce a huge unsigned size when interpreted by the
memory
copy routine.
### Component(s)
C++, Gandiva
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]