thisisnic commented on issue #47957:
URL: https://github.com/apache/arrow/issues/47957#issuecomment-3475001679
Hey, I did a bit of a session with Claude where we looked through what
changed etc and here's the output. No longer looking into this myself as I have
other tasks that need working on, but pasting here in case it's useful.
It's a bit opinionated!
_________________________________________________
Investigation Summary
After investigating the ARM64 macOS 14 substrait test failures, it looks
like protobuf 33.0 is the likely culprit.
Timeline
- Oct 13, 2025: Tests passing with runner image 20251013.0032 (protobuf
32.1)
- Oct 15, 2025: Homebrew updated protobuf from 32.1 to 33.0
(https://github.com/Homebrew/homebrew-core/commit/fd68e3781aa5cca3f377f5777d70b6dfdfe4b0f8)
- Oct 20, 2025: Tests failing with runner image 20251020.0056 (protobuf
33.0)
Technical Details
Bug location: cpp/src/arrow/engine/substrait/expression_internal.cc:428-432
```
The issue occurs during deserialization of user-defined literals:
Status Visit(const IntegerType& type) {
google::protobuf::UInt64Value value;
if (ARROW_PREDICT_FALSE(!user_defined_->value().UnpackTo(&value))) {
return FailedToUnpack("integer", "UInt64Value");
}
ARROW_ASSIGN_OR_RAISE(scalar_, MakeScalar(type.GetSharedPtr(),
value.value()));
return Status::OK();
}
```
`UnpackTo()` returns `true (success)`, but `value.value()` returns 0 instead
of the actual value.
Test failures: arrow-substrait-substrait-test - ArrowSpecificLiterals test
in cpp/src/arrow/engine/substrait/serde_test.cc:607-614
- UInt8Scalar(7) deserializes as 0
- String "hello" deserializes as ""
Platform Impact
- ❌ ARM64 macOS 14: FAILING
- ✅ AMD64 macOS 13: PASSING
- ✅ Linux (all): PASSING
- ✅ Windows: PASSING
This appears to be an ARM64-specific regression in protobuf 33.0's
google::protobuf::Any::UnpackTo() implementation.
Attempted Verification
Attempted to create diagnostic PR #48015 to pin protobuf to 32.1 on ARM64
macOS to confirm the hypothesis, but encountered difficulties with Homebrew
dependency management (gRPC requires protobuf 33.0) and lack of versioned
formulas for protobuf 32.x.
Recommendation
Given the strong timeline correlation and platform-specific nature, I
recommend:
1. Reporting this as a suspected regression to the protobuf team
2. Temporarily pinning Arrow's ARM64 macOS CI to protobuf 32.1 until
resolved
3. Monitoring for protobuf 33.x patches
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]