[
https://issues.apache.org/jira/browse/NIFI-6322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mark Payne updated NIFI-6322:
-----------------------------
Fix Version/s: 1.10.0
Status: Patch Available (was: Open)
> Evaluator Objects are rebuilt on every call even when a CompiledExpression is
> used
> ----------------------------------------------------------------------------------
>
> Key: NIFI-6322
> URL: https://issues.apache.org/jira/browse/NIFI-6322
> Project: Apache NiFi
> Issue Type: Bug
> Components: Core Framework
> Affects Versions: 1.9.2
> Reporter: Frederik Petersen
> Priority: Major
> Labels: expression-language, performance
> Fix For: 1.10.0
>
> Attachments: Selection_094.png, image.png
>
> Time Spent: 6h 10m
> Remaining Estimate: 0h
>
> Hi,
> While doing some CPU sampling in our production environment, we encountered
> some strange results. It seems like that, during the evaluation of NiFi
> expressions, the modification of a _HashSet_ is the most expensive operation
> in this process.
> !Selection_094.png!
> This feels pretty unrealistic considering all the other processing related to
> evaluating NiFi expressions.
> After reviewing some code and some profiling it just looks like this
> _HashSet_ modification is performed way more often than required. Especially
> that it is done at each evaluation.
> !image.png!
> This profiling output was produced with the following unit test:
> {code:java}
> @Test
> public void testSimple() {
> final TestRunner runner = TestRunners.newTestRunner(new RouteOnAttribute());
> runner.setProperty(RouteOnAttribute.ROUTE_STRATEGY,
> RouteOnAttribute.ROUTE_ANY_MATCHES.getValue());
> runner.setProperty("filter", "${literal('b'):equals(${a})}");
> for (int i = 0; i < 500; i++) {
> runner.enqueue(new byte[0], new HashMap<String, String>() {{
> put("a", "b");
> }});
> }
> runner.run(500);
> }{code}
> The key question is: Why are the _Evaluator_ Objects (and all the stuff
> related to it) built twice:
> - Once in _ExpressionCompiler.compile()_
> - Once again in _CompiledExpression.evaluate()_
> In other words: Every call to _CompiledExpression.evaluate()_ leads to a new
> _ExpressionCompiler_ being created and expensive calls being made. Why not
> just reuse _Evaluator_ objects created beforehand that are stored in the
> _CompiledExpression_?
> Is there a specific design decision behind that? It looks like there is room
> for performance improvement, especially for heavily used processors.
> On our live system, where we perform expensive tasks like language detection,
> mail parsing and such, this situation causes the most amount of CPU eaten by
> the expression language evaluation.
> Thank you very much for looking into this.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)