There had been a lot of internal discussions at work about the actual life cycle of the class instance instantiated by NiFi; and the variables scoped at the class level. When do Processors "reset", are new instances created for each run, or are instances recycled? What about concurrent threads and thread safety?
I'm sure several developers here on the list could have easily answered these questions :), but I decided to do some research on my own. I built a test processor that either increments a local private non-thread safe or thread safe integer, based upon the property choice you make in the processor. Just to share and discuss, below are my tests and results. The value is stored only in the private variable, no state management is used. - Test 1: 1 Concurrent Thread, Non-Thread Safe o The purpose of this test is to find out what happens to a Processors state between execution of Flow Files o After 10,000 files the value was 10,000 on the last file. This means that state is maintained in a processor between runs (this was what I assumed, but a good place to start). Each execution of the processor used the same instance of the class. - Test 2: Stop the processor, then start it again and run a single file o The purpose of this test is to figure out when a processor "resets" it's state. o After 1 file the value was 10,001. o This means that stopping and starting a processor does not reset the processor. The same class instance is still used. - Test 3: Stop, Disable, Enable, and then start again. o The purpose of this test is to see if disabling a processor causes the class instance to be disposed of. o After 1 file the value was 10,002. o This means that disabling and re-enabling a processor does not reset its state. The same class instance persists. - Test 4: Starting with a new copy of the test processor, run 10 concurrent threads, non-thread safe o The purpose of this test is to see if each thread uses its own instance of the class, or if a shared instance of the class is used. o After 10,000 files, the value was 9,975 on the last file (I didn't run this test more than once, but the value should fluctuate from run to run due to thread contention). o I saw at least 8 concurrent threads running at one point. Combined with this, and the resulting value, I'm fairly confident that the same class instance is used for all concurrent threads. - Test 5: Starting with a new copy of the test processor, run 1 Concurrent Thread, Thread-Safe o The purpose of this test is to find out what happens to a Processors state between execution of Flow Files o After 10,000 files the value was 10,000 on the last file o This means that state is maintained in a processor between runs (this matches the non-thread safe results, which makes sense for 1 concurrent thread). - Test 6: Starting with a new copy of the test processor, run 10 concurrent Thread, Thread-Safe o The purpose of this test is to contrast with the non-thread safe approach, and verify that a thread-safe object will work across concurrent threads. o After 10,000 files the value was 10,000 on the last file o This means that thread synchronization works with multiple concurrent threads for a single class instance. These tests ran on a single NiFi instance, with no clustering and are not designed to say anything about clustering. Based upon my limited test results, a Processor class is never re-instantiated unless the Processor is deleted from the flow (... yea, kind of like cheating) or NiFi restarts. There are of course other tests that could be run, welcome any feedback! Thanks, Peter