litao91 opened a new issue #7790: Repeatedly parsing expression in 
ExpressionPostAggregator wastes a lot of CPU time and potentially hits 
performance
URL: https://github.com/apache/incubator-druid/issues/7790
 
 
   ### Affected Version
   
   0.10, after `ExpressionPostAggregator` being introduced
   
   ### Description
   
   The flame graph sampling from one of our Historical Node in production 
environment is attached. 
   
   
![15590237771864](https://user-images.githubusercontent.com/1422365/58553386-675dcc00-8247-11e9-892e-12890a586e2d.png)
   
   It shows that `Expr.parse` accounts for almost 2/3 of the cpu time spending 
on query execution:
   
   
![image](https://user-images.githubusercontent.com/1422365/58553584-d5a28e80-8247-11e9-8bad-b2828481114e.png)
   
   The parsing happens in the constructor of `ExpressionPostAggregator`, for 
`decorate`:
   
   ```java
     private ExpressionPostAggregator(
         final String name,
         final String expression,
         @Nullable final String ordering,
         final ExprMacroTable macroTable,
         final Map<String, Function<Object, Object>> finalizers
     )
     {
       Preconditions.checkArgument(expression != null, "expression cannot be 
null");
   
       this.name = name;
       this.expression = expression;
       this.ordering = ordering;
       this.comparator = ordering == null ? DEFAULT_COMPARATOR : 
Ordering.valueOf(ordering);
       this.macroTable = macroTable;
       this.finalizers = finalizers;
   
       this.parsed = Parser.parse(expression, macroTable);
       this.dependentFields = 
ImmutableSet.copyOf(Parser.findRequiredBindings(parsed));
     }
   ```
   
   This is wired as theriocally the parsing should happen once for each query. 
   
   By digging into the code a little bit, I found that druid at least invoke 
once for each segment with the execution of each runner.
   
   So probably we can fix this issue by adding a cache at the 
ExpressionPostAggregator?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to