chaokunyang commented on code in PR #3455:
URL: https://github.com/apache/fory/pull/3455#discussion_r2894921550
##########
cpp/fory/serialization/collection_serializer.h:
##########
@@ -677,6 +691,13 @@ struct Serializer<
if (FORY_PREDICT_FALSE(ctx.has_error())) {
return std::vector<T, Alloc>();
}
+
+ if (FORY_PREDICT_FALSE(length > ctx.config().max_collection_size)) {
+ ctx.set_error(
+ Error::invalid_data("Collection length exceeds
max_collection_size"));
+ return std::vector<T, Alloc>();
+ }
+
Review Comment:
Nice addition on the read() path. One gap remains: several read_data() paths
still trust payload size without max_collection_size checks (for example
vector<T> non-arithmetic, vector<bool>, list, deque, and forward_list). Those
paths are used in native/nested deserialization, so guardrails can still be
bypassed there.
##########
cpp/fory/serialization/collection_serializer.h:
##########
@@ -1161,6 +1189,13 @@ template <typename T, typename Alloc> struct
Serializer<std::deque<T, Alloc>> {
if (FORY_PREDICT_FALSE(ctx.has_error())) {
return std::deque<T, Alloc>();
}
+
+ if (FORY_PREDICT_FALSE(length > ctx.config().max_collection_size)) {
+ ctx.set_error(
+ Error::invalid_data("Collection length exceeds
max_collection_size"));
+ return std::deque<T, Alloc>();
+ }
+
Review Comment:
There is still an unguarded preallocation in forward_list::read:
temp.reserve(length) happens before any max_collection_size validation. A
malicious length can force a large allocation before we fail. Please validate
length immediately after reading it, before reserve.
##########
cpp/fory/serialization/collection_serializer_test.cc:
##########
@@ -620,6 +620,47 @@ TEST(CollectionSerializerTest, ForwardListEmptyRoundTrip) {
EXPECT_TRUE(deserialized.strings.empty());
}
+// Test max_collection_size using objects (e.g., strings)
+TEST(CollectionSerializerTest, MaxCollectionSizeGuardrail) {
+ auto fory = Fory::builder()
Review Comment:
These tests are helpful, but they only cover xlang(true). The new limits
should also be validated in xlang(false)/read_data paths, otherwise we can miss
regressions in native mode (especially for vector/list/deque/forward_list).
##########
cpp/fory/serialization/context.h:
##########
@@ -361,6 +361,9 @@ class WriteContext {
/// reset context for reuse.
void reset();
+ /// get associated configuration.
+ inline const Config &config() const { return *config_; }
Review Comment:
Read-side code now uses config(), but I do not see write-side usage of
WriteContext::config(). If no write-path consumer is planned, we can drop this
accessor to keep the API surface smaller.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]