All, It seems like we get this sort of question a lot, and Simon's answer here was really good. We've had similar for discussions for Kafka[1], Storm and Spark[2]. Should we think about adding a comparison to other technologies / applications to the FAQ? Not in a sales sheet sort of way, but in a way that emphasizes how these technologies compliment each other. Obviously we don't need to go out and find every comparable technology, but having a place to put answers like Simon's that are easier to reference than the Apache mail archive might be beneficial.
Brandon [1] https://groups.google.com/forum/#!topic/confluent-platform/JKeccNEhwaQ [2] http://www.zdnet.com/article/hortonworks-cto-on-apache-nifi-what-is-it-and-why-does-it-matter-to-iot/ On Fri, May 6, 2016 at 6:09 AM Simon Ball <[email protected]> wrote: > ExecuteSQL can certainly deal with millions of rows. Sqoop currently makes > more sense if you want to distribute the query processing across a large > number of nodes (if you have 100s millions of rows 10-100GBs+ or TBs of > data), and write direct into hadoop. If you’re looking for functionality > like swoop’s incremental imports, then checkout QueryDatabaseTable. As long > as you set a sensible fetch size on that (1000ish usually good, but depends > on row size) then I’ve seen very small NiFi instances (AWS t2.small) cope > with a few millions of rows in the order of 10 seconds. > > SpringXD is really a different beast to NiFi. It’s a code->deploy pattern > rather than a command and control of data flow pattern. Once you deploy a > SpringXD flow, it’s fixed (more like spark, storm etc compile, deploy, > never change.) SpringXD recently added some visual design, but Flo is > primarily a retrospective development environment (monitor a flow, not > design it). > > Nifi also runs out to the edge, and gets the data. SpringXD runs in a core > cluster (e.g. on YARN). So in this scenario, SpringXD is more like Beam or > spark steaming. Nifi however, with site-to-site can be used to run right > out at the edge, secure and transport data and track from origin. This > means NiFi is actually a complement to technology like SpringXD and Beam. > NiFi feeds these heavier weight streaming frameworks, handles the data > movement and simple event processing, then ingesting for more complex > analytics with the like of XD. > > So in short, the technologies are complementary. NiFi has the edge of > reaching out to collect data, XD may be better for complex analytics. > > Simon > > > > On May 6, 2016, at 6:04 AM, nehakaushik86 <[email protected]> > wrote: > > > > Hi, > > > > We are designing a system where we need data ingestion framework. The > data > > will be consumed from various data systems - DB, social feeds, text > files, > > CRM etc. Can you let me know how Apache Nifi fares as compared to Spring > XD > > and what are the best use cases where it should be used? > > > > > > Also, I would like to understand the difference between Apache Nifi's > > ExecuteSQL vs Apache Sqoop. We are planning to ingest huge amount of data > > from DB - millions of records. Will ExecuteSQL be able to load such huge > > volume? > > > > > > > > -- > > View this message in context: > http://apache-nifi-developer-list.39713.n7.nabble.com/Apache-Nifi-Vs-Spring-XD-which-one-is-better-tp9963.html > > Sent from the Apache NiFi Developer List mailing list archive at > Nabble.com. > > > >
