turcsanyip commented on PR #8013: URL: https://github.com/apache/nifi/pull/8013#issuecomment-1885908417
> > I have one more open question: cleaning up the obsolete items from the state when the user changes the flow and configures e.g. a new event hub or consumer group on the processor. In this case the old ownership and checkpoint data remains persisted in the state currently. Clearing the state is the user's responsibility now. On the other hand, it also means that the user has the option to go back to the original settings and continue with those checkpoints. In general, I would opt for cleaning up the state after configuration changes and storing only the current checkpoints (state size, no outdated items). Unfortunately, there is no trivial way to clear the state (`StateManager.clear()` cannot be used due to the concurrent access in the cluster) and it was implemented without clean-up in the original PR so I did not change it. However, now it looks feasible to me and it may be worth implementing the clean-up logic. What is your opinion? > > Thanks for the updates @turcsanyip, sorry for the delay in response. > > Given that other components require manual intervention to clear the state, that seems reasonable on its own. That might also be more intuitive than automatically clearing the state based on configuration changes, even though it requires an extra step. > > I plan to take a closer look at the other changes soon, but otherwise this looks close to completion, so perhaps that is worth considering as a follow-on task? Thanks for your answer @exceptionfactory! Yes, we can implement it in a follow-on task. For example, the list processors allow clearing the state manually if the user wants to start over and list all items again. The state is cleared automatically if there is a configuration change that needs reset (e.g. setting a new base directory for listing and in this case using the previous last file timestamp does not make sense for the new directory). These processors can store the state (last timestamp) only for one target, that's why automatic reset is needed on config change. ConsumeAzureEventHub stores the state tagged by the target, that't why it is possible to store states for multiple targets. The drawback is that the old data is also transferred back and forth between NiFi and the state provider continuously. Also, the user cannot remove the old data later on without clearing the whole state including the current checkpoints (so most probably it will remain stuck in the state if it was not cleared manually just after the config change). The more I'm thinking about it, the more I'm inclined to implement the automatic clean-up... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
