arne-bdt opened a new issue, #3323:
URL: https://github.com/apache/jena/issues/3323

   ### Version
   
   5.6.0-SNAPSHOT
   
   ### Feature
   
   We've observed that the SPARQL COALESCE function in Jena performs 
significantly slower than an equivalent IF(BOUND(...), ..., ...) pattern. This 
appears to be due to COALESCE's implementation using exception handling for 
unbound variables.
   
   **Steps to Reproduce:**
   
   Consider these two functionally equivalent queries:
   
   Query 1 (using COALESCE - slower):
   ```sparql
   SELECT ?person (COALESCE(?nickname, ?fullname) as ?displayName)
   WHERE {
       ?person :hasFullName ?fullname .
       OPTIONAL { ?person :hasNickname ?nickname . }
   }
   ```
   
   Query 2 (using IF/BOUND - faster):
   ```sparql
   SELECT ?person ?displayName
   WHERE {
       ?person :hasFullName ?fullname .
       OPTIONAL { ?person :hasNickname ?nickname . }
       BIND(IF(BOUND(?nickname), ?nickname, ?fullname) as ?displayName)
   }
   ```
   
   Both queries return the nickname if available, otherwise the full name. 
However, Query 2 performs significantly better than Query 1.
   
   **Expected Behavior:**
   
   COALESCE should have comparable performance to the IF(BOUND(...), ..., ...) 
pattern for simple cases involving unbound variables.
   
   **Actual Behavior:**
   
   COALESCE is noticeably slower, particularly when the first arguments are 
frequently unbound.
   
   **Analysis:**
   
   Based on my understanding, COALESCE evaluates each expression and catches 
any `ExprEvalException` before moving to the next expression. This exception 
handling overhead appears to be the cause of the performance degradation, 
especially when dealing with OPTIONAL patterns where variables are often 
unbound.
   
   **Suggested Solution:**
   
   Extend `org.apache.jena.sparql.expr.Expr` with a method to allow fast 
pre-evaluation checks:
   
   ```java
   public boolean canEvaluate(Binding binding, FunctionEnv env) {
       return true; // default implementation for backward compatibility
   }
   ```
   
   Then implementations like `ExprVar` could override it to perform a fast 
bound check:
   ```java
   @Override
   public boolean canEvaluate(Binding binding, FunctionEnv env) {
       return binding != null && binding.contains(var);
   }
   ```
   
   This would allow COALESCE to pre-check expressions before attempting 
evaluation, avoiding the exception handling overhead for unbound variables. The 
COALESCE implementation could then use this method to skip expressions that 
would throw exceptions, only evaluating those that can succeed.
   
   ### Are you interested in contributing a solution yourself?
   
   Perhaps?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to