turcsanyip commented on PR #8013:
URL: https://github.com/apache/nifi/pull/8013#issuecomment-1885908417

   > > I have one more open question: cleaning up the obsolete items from the 
state when the user changes the flow and configures e.g. a new event hub or 
consumer group on the processor. In this case the old ownership and checkpoint 
data remains persisted in the state currently. Clearing the state is the user's 
responsibility now. On the other hand, it also means that the user has the 
option to go back to the original settings and continue with those checkpoints. 
In general, I would opt for cleaning up the state after configuration changes 
and storing only the current checkpoints (state size, no outdated items). 
Unfortunately, there is no trivial way to clear the state 
(`StateManager.clear()` cannot be used due to the concurrent access in the 
cluster) and it was implemented without clean-up in the original PR so I did 
not change it. However, now it looks feasible to me and it may be worth 
implementing the clean-up logic. What is your opinion?
   > 
   > Thanks for the updates @turcsanyip, sorry for the delay in response.
   > 
   > Given that other components require manual intervention to clear the 
state, that seems reasonable on its own. That might also be more intuitive than 
automatically clearing the state based on configuration changes, even though it 
requires an extra step.
   > 
   > I plan to take a closer look at the other changes soon, but otherwise this 
looks close to completion, so perhaps that is worth considering as a follow-on 
task?
   
   Thanks for your answer @exceptionfactory! Yes, we can implement it in a 
follow-on task.
   
   For example, the list processors allow clearing the state manually if the 
user wants to start over and list all items again. The state is cleared 
automatically if there is a configuration change that needs reset (e.g. setting 
a new base directory for listing and in this case using the previous last file 
timestamp does not make sense for the new directory). These processors can 
store the state (last timestamp) only for one target, that's why automatic 
reset is needed on config change.
   ConsumeAzureEventHub stores the state tagged by the target, that't why it is 
possible to store states for multiple targets. The drawback is that the old 
data is also transferred back and forth between NiFi and the state provider 
continuously. Also, the user cannot remove the old data later on without 
clearing the whole state including the current checkpoints (so most probably it 
will remain stuck in the state if it was not cleared manually just after the 
config change).
   The more I'm thinking about it, the more I'm inclined to implement the 
automatic clean-up...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to