Ryan - it is way too easy/lazy to throw around FUD for other projects/communities so we just try to avoid that and focus on what we know here which is Apache NiFi.
So, you'll probably not find a lot of help here in the apache nifi mailing lists for a real comparison but we can do is talk about your specific requirements and whether NiFi would be a good fit now and going forward. First, NiFi's goal is not really about the direct execution of streaming analytics. For that you want a system that offers full complex event processing capabilities and things like windowing capabilities for temporal and spatial correlation (for example). There are good options out there for specifically that purpose such as Apache Storm, Apache Flink, and others. Now, Apache NiFi can be used for a great range of powerful streaming data processing cases and in traditional terms these would be simple event processing cases. There are a ton of data transformations of format/schema, feature extraction, etc.. that are done in NiFi all the time. We've stopped short of going into the complex analytics or providing first class support for windowing. There are some really important and differentiated features in NiFi that you'll want to consider for your case. If they're not important for you then it is probably not worth using NiFi. First, NiFi offers an interactive command and control model that works on both single node and clusters of NiFi systems whereby changes you make to the system are actually happening. This is a very powerful construct that allows authorized users to make live changes to the flow as data is flowing. This makes for highly rapid evolutions from development phases through production. We made branching data extremely cheap and easy thanks to the underlying content repository approach we have. Second, we have an extremely fine grained data tracking/data provenance capability. This drives replay, click-to-content, and troubleshooting to be an extremely powerful part of the tool. It shows you how data came into nifi, what we learned about it, did to it, where we sent it, dropped it, etc.. It works even across complex flow graphs with branching, merging, etc.. You really have to check that part out. The other point i'll build on is when I mentioned the content repository. NiFi is a system which can handle extremely large objects right next to really small objects. That is the case because of how its repositories and API works. Other systems tend to load data into memory and become very memory sensitive for even really easy cases or have questionable data guarantees unless flows are build in very simple linear chains. We talk about the repositories a lot more here [1] Be sure to take a look at activity of the community and the projects overall as you think through these things. Hopefully this helps a bit. Thanks Joe [1] https://nifi.apache.org/docs/nifi-docs/html/nifi-in-depth.html On Tue, Sep 5, 2017 at 5:58 PM, Ryan Riddel <[email protected]> wrote: > Hi, > > I've been studying possible products for GUI-based pipeline creation, and > the three applications that I've come across are Apache Nifi, Spring XD, > and Cask CDAP. Cask runs Spark under the hood, and so its too slow, but > the differences between NiFi and Spring XD seem to me more subtle. They > are both performant enough to handle my requirements: <25ms end-end latency > for a simple pipeline, with 600GB/day of throughput (500Mbps peak) > > I work at a prop (trading) shop, and my goal is to make a platform with > which traders can implement their own algorithms without writing a line of > code. NiFi and Spring XD seem very similar, except XD seems to be slightly > more powerful (where NiFi can't do joining and complex windowing, XD can). > > I've trawled both mailing lists, but haven't found such a comparison. > Would anyone care to add some points of comparison between Spring XD and > NiFi? I'd be eager to contribute to the conversation with whatever stuff > I've learned. > > Ryan
