"NiFi is now emerging as the de facto standard for data engineering in
the government market in the US in part because properly hardening it is
closer to something a well-motivated intern can do than requiring a
"seasoned professional.""
Is there any way to prove this? Sounds interesting.


Mike Thomsen <[email protected]> schrieb am So., 23. Feb. 2020, 17:08:

> > I just made a few benchmarks with NiFi to compare it to another solution.
>
> Raw performance is only one consideration when choosing an ETL or data
> orchestration tool. NiFi has some very critical competitive advantages such
> as how aggressively it protects the contents of the data flow from external
> failure (ex someone killing the JVM doesn't corrupt hours of work) and how
> easy it is to very deeply harden** it on the security side of things. Plus,
> you have the fact that unlike many tools in this space, it's very agile in
> being able to stop a job at any time and inspect the inputs and outputs.
>
> ** NiFi is now emerging as the de facto standard for data engineering in
> the government market in the US in part because properly hardening it is
> closer to something a well-motivated intern can do than requiring a
> "seasoned professional."
>
> On Sun, Feb 23, 2020 at 3:36 PM Marc Pellmann <[email protected]> wrote:
>
> > Hi,
> >
> >
> > I am interested in some insight to timer driven vs. event driven and the
> > future plans with event driven.
> >
> >
> > I just made a few benchmarks with NiFi to compare it to another solution.
> >
> >
> > The flows primarily consist of synchronous Web Service/REST like calls.
> So
> > I use HandleHttpRequest/HandleHttpResponse. In the concrete example I
> just
> > have two processors in between - a ReplaceText and a TransformXml.
> >
> >
> > From the client side I use JMeter to generate the load (just POST calls
> > with a few bytes content).
> >
> >
> > First I tested this with standard values, which means timer driven
> > scheduling strategy and 1 task.
> >
> >
> > The numbers from this tests where not very impressive, so I played with
> the
> > configuration and setted the scheduling strategy to event driven (with
> task
> > value 0 and maximum event driven thread count of 1). This could be only
> > done for the two processors between and not for the
> > HandleHttpRequest/HandleHttpResponse since they do not allow such
> > configuration.
> >
> >
> > This increased the throughput by the factor 6.
> >
> >
> > I also tested to increase the throughput with some other configurations,
> > such as more tasks or different run durations, but this did not changed
> the
> > values significantly.
> >
> >
> > So a least for this type of scenario, the event driven configuration is
> > much better. But on the other side it is still experimental and according
> > to some posts it is not seen as a good option and sounds more like it is
> > something that might be removed.
> >
> >
> > Why is this?
> >
> >
> > Also I would expect an event driven configuration option for
> > HandleHttpRequest, since there is already the event of http request
> occurs.
> >
> >
> > Best regards,
> >
> > Marc
> >
>
>

Reply via email to