Not with hard numbers, but when you look at job reqs and proposals it's ***everywhere***. I also can't remember the last time I saw a data engineering demo or discussion where NiFi or StreamSets wasn't the foundation.
On Sun, Feb 23, 2020 at 4:21 PM Martin Ebert <[email protected]> wrote: > "NiFi is now emerging as the de facto standard for data engineering in > the government market in the US in part because properly hardening it is > closer to something a well-motivated intern can do than requiring a > "seasoned professional."" > Is there any way to prove this? Sounds interesting. > > > Mike Thomsen <[email protected]> schrieb am So., 23. Feb. 2020, > 17:08: > > > > I just made a few benchmarks with NiFi to compare it to another > solution. > > > > Raw performance is only one consideration when choosing an ETL or data > > orchestration tool. NiFi has some very critical competitive advantages > such > > as how aggressively it protects the contents of the data flow from > external > > failure (ex someone killing the JVM doesn't corrupt hours of work) and > how > > easy it is to very deeply harden** it on the security side of things. > Plus, > > you have the fact that unlike many tools in this space, it's very agile > in > > being able to stop a job at any time and inspect the inputs and outputs. > > > > ** NiFi is now emerging as the de facto standard for data engineering in > > the government market in the US in part because properly hardening it is > > closer to something a well-motivated intern can do than requiring a > > "seasoned professional." > > > > On Sun, Feb 23, 2020 at 3:36 PM Marc Pellmann <[email protected]> > wrote: > > > > > Hi, > > > > > > > > > I am interested in some insight to timer driven vs. event driven and > the > > > future plans with event driven. > > > > > > > > > I just made a few benchmarks with NiFi to compare it to another > solution. > > > > > > > > > The flows primarily consist of synchronous Web Service/REST like calls. > > So > > > I use HandleHttpRequest/HandleHttpResponse. In the concrete example I > > just > > > have two processors in between - a ReplaceText and a TransformXml. > > > > > > > > > From the client side I use JMeter to generate the load (just POST calls > > > with a few bytes content). > > > > > > > > > First I tested this with standard values, which means timer driven > > > scheduling strategy and 1 task. > > > > > > > > > The numbers from this tests where not very impressive, so I played with > > the > > > configuration and setted the scheduling strategy to event driven (with > > task > > > value 0 and maximum event driven thread count of 1). This could be only > > > done for the two processors between and not for the > > > HandleHttpRequest/HandleHttpResponse since they do not allow such > > > configuration. > > > > > > > > > This increased the throughput by the factor 6. > > > > > > > > > I also tested to increase the throughput with some other > configurations, > > > such as more tasks or different run durations, but this did not changed > > the > > > values significantly. > > > > > > > > > So a least for this type of scenario, the event driven configuration is > > > much better. But on the other side it is still experimental and > according > > > to some posts it is not seen as a good option and sounds more like it > is > > > something that might be removed. > > > > > > > > > Why is this? > > > > > > > > > Also I would expect an event driven configuration option for > > > HandleHttpRequest, since there is already the event of http request > > occurs. > > > > > > > > > Best regards, > > > > > > Marc > > > > > > > >
