There had been a lot of internal discussions at work about the actual life 
cycle of the class instance instantiated by NiFi; and the variables scoped at 
the class level. When do Processors "reset", are new instances created for each 
run, or are instances recycled? What about concurrent threads and thread safety?

I'm sure several developers here on the list could have easily answered these 
questions :), but I decided to do some research on my own. I built a test 
processor that either increments a local private non-thread safe or thread safe 
integer, based upon the property choice you make in the processor.  Just to 
share and discuss, below are my tests and results. The value is stored only in 
the private variable, no state management is used.


-          Test 1: 1 Concurrent Thread, Non-Thread Safe

o   The purpose of this test is to find out what happens to a Processors state 
between execution of Flow Files

o   After 10,000 files the value was 10,000 on the last file. This means that 
state is maintained in a processor between runs (this was what I assumed, but a 
good place to start). Each execution of the processor used the same instance of 
the class.

-          Test 2: Stop the processor, then start it again and run a single file

o   The purpose of this test is to figure out when a processor "resets" it's 
state.

o   After 1 file the value was 10,001.

o   This means that stopping and starting a processor does not reset the 
processor. The same class instance is still used.

-          Test 3: Stop, Disable, Enable, and then start again.

o   The purpose of this test is to see if disabling a processor causes the 
class instance to be disposed of.

o   After 1 file the value was 10,002.

o   This means that disabling and re-enabling a processor does not reset its 
state. The same class instance persists.

-          Test 4: Starting with a new copy of the test processor, run 10 
concurrent threads, non-thread safe

o   The purpose of this test is to see if each thread uses its own instance of 
the class, or if a shared instance of the class is used.

o   After 10,000 files, the value was 9,975 on the last file (I didn't run this 
test more than once, but the value should fluctuate from run to run due to 
thread contention).

o   I saw at least 8 concurrent threads running at one point. Combined with 
this, and the resulting value, I'm fairly confident that the same class 
instance is used for all concurrent threads.

-          Test 5: Starting with a new copy of the test processor, run 1 
Concurrent Thread, Thread-Safe

o   The purpose of this test is to find out what happens to a Processors state 
between execution of Flow Files

o   After 10,000 files the value was 10,000 on the last file

o   This means that state is maintained in a processor between runs (this 
matches the non-thread safe results, which makes sense for 1 concurrent thread).

-          Test 6: Starting with a new copy of the test processor, run 10 
concurrent Thread, Thread-Safe

o   The purpose of this test is to contrast with the non-thread safe approach, 
and verify that a thread-safe object will work across concurrent threads.

o   After 10,000 files the value was 10,000 on the last file

o   This means that thread synchronization works with multiple concurrent 
threads for a single class instance.

These tests ran on a single NiFi instance, with no clustering and are not 
designed to say anything about clustering.

Based upon my limited test results, a Processor class is never re-instantiated 
unless the Processor is deleted from the flow (... yea, kind of like cheating) 
or NiFi restarts. There are of course other tests that could be run, welcome 
any feedback!

Thanks,
  Peter

Reply via email to