Also of note, the distributed cache service is probably the closest to a cluster-wide framework state management service. It currently uses our own persistence backend, but it's conceivable to adapt the distributed cache to use a database, jndi resource, or a true cache engine, like ehcache.
Adam On Wed, Jan 14, 2015 at 7:12 AM, Joe Witt <[email protected]> wrote: > Joe - thanks for bumping this. > > Bryan, > > "What are the best practices for implementing a processor that needs to > maintain some kind of state? > > I'm thinking of a processor that executes on a timer and pulls data from > somewhere, but needs to know where it left off for the next execution, and > I was hoping to not involve an external data store here." > > The only managed state the framework provides is through the use of Flow > File objects and the passing of them between processors. To keep > persistent accounting for a given processor of some state of what its doing > that exists outside of that then you do need to implement some state > persistence mechanism (to a file, to a database, etc..). > > One example of a processor that does this is the GetHttp processor. It > interacts with web services and in so doing needs to keep track of any > cache/E-Tag information it receives so it can be smart about pulling the > same resource or not depending on whether the server indicates it has > changed. How this processor does this is by saving off a file in > 'conf/.httpCache-<<processor uuid>>' This use of the processor uuid in the > name avoids conflicts with other processors of the same type and makes > referencing it on startup very easy. If it is there use it to recover > state and if not start a new one. > > That said it is clearly desirable for the framework to offer some sort of > managed state mechanism for such simple cases. We've talked about this > many times over the years but just never pulled the trigger because there > was always some aspect of our design ideas we didn't like. So for right > now you'll need to implement state persistence like this outside the > framework. But I've also kicked off a Jira for doing something about this > here: https://issues.apache.org/jira/browse/NIFI-259 > > What you were seeing in GetKafka and GetJMS processors was management of > state that involves interaction with their specific resources (Kafka, > JMS). In the case of JMS it was a connection pooling type mechanism and in > the case of Kafka it was part of Kafkas stream iterator. That is a > different thing than this managed persistent state you're asking about. > > This is an important topic for us to communicate very well on. Please feel > free to keep firing away until we've answered it fully. > > Thanks > Joe > > On Wed, Jan 14, 2015 at 5:06 AM, Joe Gresock <[email protected]> wrote: > > > I'm also interested in the answers to Bryan's questions, if anyone has > some > > input. > > > > Thanks, > > Joe > > > > On Fri, Jan 9, 2015 at 3:50 PM, Bryan Bende <[email protected]> wrote: > > > > > What are the best practices for implementing a processor that needs to > > > maintain some kind of state? > > > > > > I'm thinking of a processor that executes on a timer and pulls data > from > > > somewhere, but needs to know where it left off for the next execution, > > and > > > I was hoping to not involve an external data store here. > > > > > > From looking at processors like GetJMS and GetKafka, I noticed the use > of > > > BlockingQueue<> where poll() is called at the beginning of onTrigger(), > > and > > > then the object is put back in the queue in a finally block. > > > > > > As far as I could tell it looks like the intent was to only have one > > object > > > in the queue, and use the queue as the mechanism for synchronizing > access > > > to the shared object, so that if another thread called onTrigger it > would > > > block on poll() until the previous execution put the object back in the > > > queue. > > > > > > Is that the general approach? > > > > > > Thanks, > > > > > > Bryan > > > > > > > > > > > -- > > I know what it is to be in need, and I know what it is to have plenty. I > > have learned the secret of being content in any and every situation, > > whether well fed or hungry, whether living in plenty or in want. I can > do > > all this through him who gives me strength. *-Philippians 4:12-13* > > >
