Re: [PR] Reduce amount of expression objects created during evaluations (druid)

via GitHub Sat, 06 Jan 2024 14:29:25 -0800


gianm commented on PR #15552:
URL: https://github.com/apache/druid/pull/15552#issuecomment-1879849582


   > create the ExprEval early for ConstantExpr -s (except the one for 
BigInteger which seem to have some odd contract)
   
   This part is problematic since `ExprEval` is not thread-safe. For example, 
`NestedDataExprEval#asDouble` calls `computeNumber` which has race problems if 
called from two threads simultaneously:
   
   ```
       private void computeNumber()
       {
         if (!computedNumber && value != null) {
           computedNumber = true;
           Object val = StructuredData.unwrap(value);
           if (val instanceof Number) {
             number = (Number) val;
           } else if (val instanceof Boolean) {
             number = Evals.asLong((Boolean) val);
           } else if (val instanceof String) {
             number = ExprEval.computeNumber((String) val);
           }
         }
       }
   ```
   
   To safely cache the `ExprEval` we have a couple of options.
   
   One is to ensure that the `ExprEval` stashed in `ConstantExpr` are 
immutable. Perhaps by adding a method on `ExprEval` that is used to ensure all 
the caches for `asDouble()`, `asString()`, etc, are populated. Once those are 
all populated then the object becomes thread-safe.
   
   Another is to add a `singleThreaded()` method to make a thread-unsafe + 
optimized version, like the technique used by `GenericIndexed#singleThreaded`. 
The equivalent here would be something like `Expr#singleThreaded`, and _that_ 
would be the one that creates the `ExprEval` (in a copy of the `Expr`). Callers 
that are about to call `expr.eval(bindings)` in a hot loop would then first do 
`singleThreadedExpr = expr.singleThreaded()`, then do 
`singleThreadedExpr.eval(bindings)`. Callers that _don't_ do this will still 
function properly, but they won't get the stashed eval optimization. Sketch of 
the approach:
   
   ```java
   class DoubleExpr extends ConstantExpr<Double>
   {
     @Nullable
     private final ExprEval<?> eval;
   
     // constructor used by Parser
     DoubleExpr(Double value)
     {
       super(ExpressionType.DOUBLE, Preconditions.checkNotNull(value, "value"));
     }
   
     // constructor used by singleThreaded
     DoubleExpr(ExprEval<?> eval)
     {
       super(ExpressionType.DOUBLE, Preconditions.checkNotNull(eval.value(), 
"value"));
     }
   
     @Override // overrides method from Expr
     public Expr singleThreaded()
     {
       return new DoubleExpr(ExprEval.ofDouble(value));
     }
   
     @Override
     public ExprEval eval(ObjectBindings bindings)
     {
       // thread-safe no matter what, but more optimized if someone had called 
singleThreaded()
       return eval != null ? eval : ExprEval.ofDouble(value);
     }
   }
   ```
   
   I'm partial to the second one since I am thinking we might want to add a 
`Utf8ExprEval` in the future that has a `ByteBuffer` value, and that would 
benefit from the second approach (we can call `ByteBuffer#duplicate` in 
`singleThreaded()` and avoid calling it during `eval`) but would not benefit 
from the first approach (we would need to call `ByteBuffer#duplicate` in 
`eval`, since ByteBuffer cannot ever be shared across threads, due to the 
builtin `position` being changed by various operations).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Reduce amount of expression objects created during evaluations (druid)

Reply via email to