arne-bdt opened a new issue, #3323:
URL: https://github.com/apache/jena/issues/3323
### Version
5.6.0-SNAPSHOT
### Feature
We've observed that the SPARQL COALESCE function in Jena performs
significantly slower than an equivalent IF(BOUND(...), ..., ...) pattern. This
appears to be due to COALESCE's implementation using exception handling for
unbound variables.
**Steps to Reproduce:**
Consider these two functionally equivalent queries:
Query 1 (using COALESCE - slower):
```sparql
SELECT ?person (COALESCE(?nickname, ?fullname) as ?displayName)
WHERE {
?person :hasFullName ?fullname .
OPTIONAL { ?person :hasNickname ?nickname . }
}
```
Query 2 (using IF/BOUND - faster):
```sparql
SELECT ?person ?displayName
WHERE {
?person :hasFullName ?fullname .
OPTIONAL { ?person :hasNickname ?nickname . }
BIND(IF(BOUND(?nickname), ?nickname, ?fullname) as ?displayName)
}
```
Both queries return the nickname if available, otherwise the full name.
However, Query 2 performs significantly better than Query 1.
**Expected Behavior:**
COALESCE should have comparable performance to the IF(BOUND(...), ..., ...)
pattern for simple cases involving unbound variables.
**Actual Behavior:**
COALESCE is noticeably slower, particularly when the first arguments are
frequently unbound.
**Analysis:**
Based on my understanding, COALESCE evaluates each expression and catches
any `ExprEvalException` before moving to the next expression. This exception
handling overhead appears to be the cause of the performance degradation,
especially when dealing with OPTIONAL patterns where variables are often
unbound.
**Suggested Solution:**
Extend `org.apache.jena.sparql.expr.Expr` with a method to allow fast
pre-evaluation checks:
```java
public boolean canEvaluate(Binding binding, FunctionEnv env) {
return true; // default implementation for backward compatibility
}
```
Then implementations like `ExprVar` could override it to perform a fast
bound check:
```java
@Override
public boolean canEvaluate(Binding binding, FunctionEnv env) {
return binding != null && binding.contains(var);
}
```
This would allow COALESCE to pre-check expressions before attempting
evaluation, avoiding the exception handling overhead for unbound variables. The
COALESCE implementation could then use this method to skip expressions that
would throw exceptions, only evaluating those that can succeed.
### Are you interested in contributing a solution yourself?
Perhaps?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]