> I just made a few benchmarks with NiFi to compare it to another solution.

Raw performance is only one consideration when choosing an ETL or data
orchestration tool. NiFi has some very critical competitive advantages such
as how aggressively it protects the contents of the data flow from external
failure (ex someone killing the JVM doesn't corrupt hours of work) and how
easy it is to very deeply harden** it on the security side of things. Plus,
you have the fact that unlike many tools in this space, it's very agile in
being able to stop a job at any time and inspect the inputs and outputs.

** NiFi is now emerging as the de facto standard for data engineering in
the government market in the US in part because properly hardening it is
closer to something a well-motivated intern can do than requiring a
"seasoned professional."

On Sun, Feb 23, 2020 at 3:36 PM Marc Pellmann <[email protected]> wrote:

> Hi,
>
>
> I am interested in some insight to timer driven vs. event driven and the
> future plans with event driven.
>
>
> I just made a few benchmarks with NiFi to compare it to another solution.
>
>
> The flows primarily consist of synchronous Web Service/REST like calls. So
> I use HandleHttpRequest/HandleHttpResponse. In the concrete example I just
> have two processors in between - a ReplaceText and a TransformXml.
>
>
> From the client side I use JMeter to generate the load (just POST calls
> with a few bytes content).
>
>
> First I tested this with standard values, which means timer driven
> scheduling strategy and 1 task.
>
>
> The numbers from this tests where not very impressive, so I played with the
> configuration and setted the scheduling strategy to event driven (with task
> value 0 and maximum event driven thread count of 1). This could be only
> done for the two processors between and not for the
> HandleHttpRequest/HandleHttpResponse since they do not allow such
> configuration.
>
>
> This increased the throughput by the factor 6.
>
>
> I also tested to increase the throughput with some other configurations,
> such as more tasks or different run durations, but this did not changed the
> values significantly.
>
>
> So a least for this type of scenario, the event driven configuration is
> much better. But on the other side it is still experimental and according
> to some posts it is not seen as a good option and sounds more like it is
> something that might be removed.
>
>
> Why is this?
>
>
> Also I would expect an event driven configuration option for
> HandleHttpRequest, since there is already the event of http request occurs.
>
>
> Best regards,
>
> Marc
>

Reply via email to