mohit7705 opened a new pull request, #48792:
URL: https://github.com/apache/arrow/pull/48792

   ### What does this PR do?
   
   This PR fixes an issue where Substrait scalar expressions could cause
   duplicate registration of the same function URI during serialization.
   
   The function reference was being encoded multiple times while building
   nested expressions (e.g. large OR chains), leading to exponential growth
   in extension URIs and serialized plan size.
   
   ### What was changed?
   
   - Ensure `EncodeFunction(call.id())` is invoked exactly once per
     `ScalarFunction` encoding.
   - Avoid repeated URI registration while serializing nested expressions.
   
   ### Why is this needed?
   
   Without this fix, expressions with many nested logical operators
   (e.g. OR conditions) cause the Substrait plan size to grow exponentially,
   which can severely impact memory usage and performance.
   
   ### Testing
   
   - Reproduced using a Python script that serializes expressions with
     increasing OR conditions.
   - Verified that serialization still succeeds and the fix does not
     introduce regressions.
   
   ### Related issue
   
   Fixes #48761
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to