Guys, Certainly some great points here and important concepts to keep in mind!
One thing to remember, though, that very much differentiates NIFi from Storm or HBase or Accumulo: those systems are typically expected to scale to hundreds or thousands of nodes (you mention a couple hundred node HBase cluster being "modest"). With NiFi, we are typically operating clusters on the order of several nodes to dozens of nodes. A couple hundred nodes would be a pretty massive NiFi cluster. In terms of storing state, we could potentially get into more of a sticky situation if not done carefully. However, we generally expect the "State Management" feature to be used for occasionally storing small amounts of state, such as for ListHDFS storing 2 timestamps and we don't expect ListHDFS to be continually hammering HDFS asking for a new listing. It may be scheduled to run once every few minutes for example. That said, we certainly have been designing everything using appropriate interfaces in such a way that if we do later decide that we want to use some other mechanism for storing state and/or heartbeats, it should be a very reasonable path to take advantage of some other service for one or both of these tasks. Thanks -Mark > On Mar 31, 2016, at 5:35 PM, Sean Busbey <[email protected]> wrote: > > HBase has also had issues with ZK at modest (~couple hundred node) > scale when using it to act as an intermediary for heartbeats and to do > work assignment. > > On Thu, Mar 31, 2016 at 4:33 PM, Tony Kurc <[email protected]> wrote: >> My commentary that didn't accompany this - it appears Storm was using >> zookeeper in a similar way as the road we're heading down, and it was such >> a major bottleneck that they moved key value storage and heartbeating out >> into separate services, and then re-engineering (i.e. built Heron). Before >> we get too dependent on zookeeper, may be worth learning some lessons from >> the crew that built Heron or from a team that learned zookeeper lessons >> scale like accumulo. >> >> On Thu, Mar 24, 2016 at 6:22 PM, Tony Kurc <[email protected]> wrote: >> >>> I mentioned slides I saw at the meetup about zookeeper perils at scale in >>> storm, here are slides, i couldn't find a video after some limited >>> searching. >>> https://qconsf.com/system/files/presentation-slides/heron-qcon-2015.pdf >>>
