[ https://issues.apache.org/jira/browse/NIFI-12236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17793786#comment-17793786 ]
Joe Witt commented on NIFI-12236: --------------------------------- I want to keep the different threads of concern clear. The idea: * I think we all agree there is good value in having these data points persisted across restarts. The example Pierre gives is a perfect example of why. * The reality though is our user interface is not designed for data to be held for long. It isnt clear to me from the properties mentioned in PR but what is the plan for helping the user configure how long this data is retained/queryable? I see batch size and frequency but not quite sure what those would mean relative to retention. Perhaps that is part of the earlier implementation. The implementation options: * Offer an embedded state storage mechanism. QuestDB is one example. A database per nifi instance or a database per cluster. This embedded/batteries included mode is quite convenient for the user but we then of course have to do it quite thoroughly and consider upgrade scenarios. I think our pucker factor here is far higher given the challenges we've had to work through from H2 in the recent year or two. * Offer the ability to connect to/use a database of the user choosing. Defer installation/durability/security of that database to the user as part of their normal database operations/etc.. This I find is more in-line with deployment styles we see in the Cloud, or automated with Ansible, or how one would deploy in K8S. This implementation: * It is a questdb per nifi node. Does not address the other mode should a user want that. Of note when we offered the Zookeeper embedded mode we also offered to connect to a real zookeeper install. That was to reflect the likely non-prod vs prod usage and I think that holds here as well. * It has a change to the nifi-api. We should strongly avoid any such API changes as part of this activity unless that change has value to various other components and the purpose/meaning of that change is very clear. * It includes a Retry mechanism that the PR suggests might be addressed in a later commit. I don't quite follow what that really means but I recommend making this implementation as simple/straight forward as possible. The default selection: * We should not be changing the default until this model is proven to be stable and recoverable in the same manner to the current in memory implementation. > Improving fault tolerancy of the QuestDB backed metrics repository > ------------------------------------------------------------------ > > Key: NIFI-12236 > URL: https://issues.apache.org/jira/browse/NIFI-12236 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework > Reporter: Simon Bence > Assignee: Simon Bence > Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > Based on the related discussion on the dev email list, the QuestDB handling > of the metrics repository needs to be improved to have better fault tolerance > in order to be possible to use as a viable option for default metrics data > store. This should primarily focus on handling unexpeted database events like > corrupted database or loss of space on the disk. Any issues should be handled > with an attempt to keep the database service healthy but in case of that is > impossible, the priority is to keep NiFi and the core services running, even > with the price of metrics collection / presentation outage. -- This message was sent by Atlassian Jira (v8.20.10#820010)