"NiFi is now emerging as the de facto standard for data engineering in the government market in the US in part because properly hardening it is closer to something a well-motivated intern can do than requiring a "seasoned professional."" Is there any way to prove this? Sounds interesting.
Mike Thomsen <[email protected]> schrieb am So., 23. Feb. 2020, 17:08: > > I just made a few benchmarks with NiFi to compare it to another solution. > > Raw performance is only one consideration when choosing an ETL or data > orchestration tool. NiFi has some very critical competitive advantages such > as how aggressively it protects the contents of the data flow from external > failure (ex someone killing the JVM doesn't corrupt hours of work) and how > easy it is to very deeply harden** it on the security side of things. Plus, > you have the fact that unlike many tools in this space, it's very agile in > being able to stop a job at any time and inspect the inputs and outputs. > > ** NiFi is now emerging as the de facto standard for data engineering in > the government market in the US in part because properly hardening it is > closer to something a well-motivated intern can do than requiring a > "seasoned professional." > > On Sun, Feb 23, 2020 at 3:36 PM Marc Pellmann <[email protected]> wrote: > > > Hi, > > > > > > I am interested in some insight to timer driven vs. event driven and the > > future plans with event driven. > > > > > > I just made a few benchmarks with NiFi to compare it to another solution. > > > > > > The flows primarily consist of synchronous Web Service/REST like calls. > So > > I use HandleHttpRequest/HandleHttpResponse. In the concrete example I > just > > have two processors in between - a ReplaceText and a TransformXml. > > > > > > From the client side I use JMeter to generate the load (just POST calls > > with a few bytes content). > > > > > > First I tested this with standard values, which means timer driven > > scheduling strategy and 1 task. > > > > > > The numbers from this tests where not very impressive, so I played with > the > > configuration and setted the scheduling strategy to event driven (with > task > > value 0 and maximum event driven thread count of 1). This could be only > > done for the two processors between and not for the > > HandleHttpRequest/HandleHttpResponse since they do not allow such > > configuration. > > > > > > This increased the throughput by the factor 6. > > > > > > I also tested to increase the throughput with some other configurations, > > such as more tasks or different run durations, but this did not changed > the > > values significantly. > > > > > > So a least for this type of scenario, the event driven configuration is > > much better. But on the other side it is still experimental and according > > to some posts it is not seen as a good option and sounds more like it is > > something that might be removed. > > > > > > Why is this? > > > > > > Also I would expect an event driven configuration option for > > HandleHttpRequest, since there is already the event of http request > occurs. > > > > > > Best regards, > > > > Marc > > > >
