mohit7705 commented on code in PR #48792:
URL: https://github.com/apache/arrow/pull/48792#discussion_r2677745918
##########
cpp/src/arrow/engine/substrait/expression_internal.cc:
##########
@@ -1249,42 +1249,52 @@ Result<std::unique_ptr<substrait::Expression>>
MakeListElementReference(
Result<std::unique_ptr<substrait::Expression::ScalarFunction>>
EncodeSubstraitCall(
const SubstraitCall& call, ExtensionSet* ext_set,
const ConversionOptions& conversion_options) {
+
+ auto scalar_fn =
+ std::make_unique<substrait::Expression::ScalarFunction>();
+
+ // Encode function ONCE per scalar function (FIX)
Review Comment:
Thanks for the review and for pointing this out. @ahsanabbas123 ,
@HyukjinKwon
I want to clarify that the earlier discussion can be ignored — I’ve updated
the implementation since then.
What changed:
The fix now ensures that EncodeFunction(call.id()) is invoked exactly once
per ScalarFunction, by constructing the ScalarFunction object before
recursively encoding arguments. This prevents repeated registration of the same
function URI when serializing nested expressions (e.g. large OR chains), which
was the root cause of the exponential growth.
Why this fixes the issue:
Previously, function encoding could occur during recursive argument
serialization, causing duplicate URI registrations. With this change, the
function reference is encoded once per scalar function, regardless of nesting
depth.
Testing:
I reproduced the original issue using a Python script that serializes
expressions with increasing OR conditions and verified that:
serialization still succeeds
the number of registered extension URIs no longer grows exponentially
no regressions were observed
Please let me know if you’d like me to add a regression test for this, or if
the explanation can be improved.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]