litao91 opened a new issue #7790: Repeatedly parsing expression in ExpressionPostAggregator wastes a lot of CPU time and potentially hits performance URL: https://github.com/apache/incubator-druid/issues/7790 ### Affected Version 0.10, after `ExpressionPostAggregator` being introduced ### Description The flame graph sampling from one of our Historical Node in production environment is attached.  It shows that `Expr.parse` accounts for almost 2/3 of the cpu time spending on query execution:  The parsing happens in the constructor of `ExpressionPostAggregator`, for `decorate`: ```java private ExpressionPostAggregator( final String name, final String expression, @Nullable final String ordering, final ExprMacroTable macroTable, final Map<String, Function<Object, Object>> finalizers ) { Preconditions.checkArgument(expression != null, "expression cannot be null"); this.name = name; this.expression = expression; this.ordering = ordering; this.comparator = ordering == null ? DEFAULT_COMPARATOR : Ordering.valueOf(ordering); this.macroTable = macroTable; this.finalizers = finalizers; this.parsed = Parser.parse(expression, macroTable); this.dependentFields = ImmutableSet.copyOf(Parser.findRequiredBindings(parsed)); } ``` This is wired as theriocally the parsing should happen once for each query. By digging into the code a little bit, I found that druid at least invoke once for each segment with the execution of each runner. So probably we can fix this issue by adding a cache at the ExpressionPostAggregator?
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
