[ 
https://issues.apache.org/jira/browse/NIFI-6322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Frederik Petersen updated NIFI-6322:
------------------------------------
    Description: 
Hi, 

While doing some CPU sampling in our production environment, we encountered 
some strange results. It seems like that, during the evaluation of NiFi 
expressions, the modification of a _HashSet_ is the most expensive operation in 
this process.

!Selection_094.png!

This feels pretty unrealistic considering all the other processing related to 
evaluating NiFi expressions. 
 After reviewing some code and some profiling it just looks like this _HashSet_ 
modification is performed way more often than required. Especially that it is 
done at each evaluation.

!image.png!
 This profiling output was produced with the following unit test:
{code:java}
@Test
public void testSimple() {
 final TestRunner runner = TestRunners.newTestRunner(new RouteOnAttribute());
 runner.setProperty(RouteOnAttribute.ROUTE_STRATEGY, 
RouteOnAttribute.ROUTE_ANY_MATCHES.getValue());
 runner.setProperty("filter", "${literal('b'):equals(${a})}");
 for (int i = 0; i < 500; i++) {
 runner.enqueue(new byte[0], new HashMap<String, String>() {{
 put("a", "b");
 }});
 }
 runner.run(500);
}{code}
The key question is: Why are the _Evaluator_ Objects (and all the stuff related 
to it) built twice:
 - Once in _ExpressionCompiler.compile()_
 - Once again in _CompiledExpression.evaluate()_

In other words: Every call to _CompiledExpression.evaluate()_ leads to a new 
_ExpressionCompiler_ being created and expensive calls being made. Why not just 
reuse _Evaluator_ objects created beforehand that are stored in the 
_CompiledExpression_?

Is there a specific design decision behind that? It looks like there is room 
for performance improvement, especially for heavily used processors.

On our live system, where we perform expensive tasks like language detection, 
mail parsing and such, this situation causes the most amount of CPU eaten by 
the expression language evaluation.

Thank you very much for looking into this.

 

  was:
Hi, 

While doing some CPU sampling in our production environment, we encountered 
some strange results. It seems like that, during the evaluation of NiFi 
expressions, the modification of a _HashSet_ is the most expensive operation in 
this process. 

!Selection_094.png!


This feels pretty unrealistic considering all the other processing related to 
evaluating NiFi expressions. 
After reviewing some code and some profiling it just looks like this _HashSet_ 
modification is performed way more often than required. Especially that it is 
done at each evaluation.

!image.png!
This profiling output was produced with the following unit test:


{code:java}
@Test
public void testSimple() {
 final TestRunner runner = TestRunners.newTestRunner(new RouteOnAttribute());
 runner.setProperty(RouteOnAttribute.ROUTE_STRATEGY, 
RouteOnAttribute.ROUTE_ANY_MATCHES.getValue());
 runner.setProperty("filter", "${literal('b'):equals(${a})}");
 for (int i = 0; i < 500; i++) {
 runner.enqueue(new byte[0], new HashMap<String, String>() {{
 put("a", "b");
 }});
 }
 runner.run(500);
}{code}

The key question is: Why are the _Evaluator_ Objects (and all the stuff related 
to it) built twice:
- Once in _ExpressionCompiler.compile()_
- Once again in _CompiledExpression.evaluate()_

In other words: Every call to _CompiledExpression.evaluate()_ leads to a new 
_ExpressionCompiler_ being created and expensive calls being made. Why not just 
reuse _Evaluator_ objects created beforehand that are sroted in the 
_CompiledExpression_?

Is there a specific design decision behind that? It looks like there is room 
for performance improvement, especially for heavily used processors.

On our live system, where we perform expensive tasks like language detection, 
mail parsing and such, this situation causes the most amount of CPU eaten by 
the expression language evaluation.

Thank you very much for looking into this.

 


> Evaluator Objects are rebuilt on every call even when a CompiledExpression is 
> used
> ----------------------------------------------------------------------------------
>
>                 Key: NIFI-6322
>                 URL: https://issues.apache.org/jira/browse/NIFI-6322
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework
>    Affects Versions: 1.9.2
>            Reporter: Frederik Petersen
>            Priority: Major
>              Labels: expression-language, performance
>         Attachments: Selection_094.png, image.png
>
>
> Hi, 
> While doing some CPU sampling in our production environment, we encountered 
> some strange results. It seems like that, during the evaluation of NiFi 
> expressions, the modification of a _HashSet_ is the most expensive operation 
> in this process.
> !Selection_094.png!
> This feels pretty unrealistic considering all the other processing related to 
> evaluating NiFi expressions. 
>  After reviewing some code and some profiling it just looks like this 
> _HashSet_ modification is performed way more often than required. Especially 
> that it is done at each evaluation.
> !image.png!
>  This profiling output was produced with the following unit test:
> {code:java}
> @Test
> public void testSimple() {
>  final TestRunner runner = TestRunners.newTestRunner(new RouteOnAttribute());
>  runner.setProperty(RouteOnAttribute.ROUTE_STRATEGY, 
> RouteOnAttribute.ROUTE_ANY_MATCHES.getValue());
>  runner.setProperty("filter", "${literal('b'):equals(${a})}");
>  for (int i = 0; i < 500; i++) {
>  runner.enqueue(new byte[0], new HashMap<String, String>() {{
>  put("a", "b");
>  }});
>  }
>  runner.run(500);
> }{code}
> The key question is: Why are the _Evaluator_ Objects (and all the stuff 
> related to it) built twice:
>  - Once in _ExpressionCompiler.compile()_
>  - Once again in _CompiledExpression.evaluate()_
> In other words: Every call to _CompiledExpression.evaluate()_ leads to a new 
> _ExpressionCompiler_ being created and expensive calls being made. Why not 
> just reuse _Evaluator_ objects created beforehand that are stored in the 
> _CompiledExpression_?
> Is there a specific design decision behind that? It looks like there is room 
> for performance improvement, especially for heavily used processors.
> On our live system, where we perform expensive tasks like language detection, 
> mail parsing and such, this situation causes the most amount of CPU eaten by 
> the expression language evaluation.
> Thank you very much for looking into this.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to