[ 
https://issues.apache.org/jira/browse/NIFI-12236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17793816#comment-17793816
 ] 

Simon Bence commented on NIFI-12236:
------------------------------------

Thanks for the clarification! Let me give some context for the different 
questions.

The idea:
- The funcionality this provides almost identical to the original 
implementation not talking about error management. This is truth to the 
possible parametrization which however was not fully exposed to properties 
before. Now some of these settings (like the mentioned batch size) are exposed 
to provide the possibility to tune them in the case of need.

The implementation options:
- I completely agree with you on the part of in the long run it would be 
beneficial to have a variery of possible storage mechanism. Even so I think 
opening up the Status History Registry as a pluggable service would be a good 
step forward. It was not however the aim of this PR. The PR focused mainly on 
"bulletproofing" the existing implementation not extending the capabilities.

The implementation:
- I think this is the same case as the previous. The PR did not aim for 
extending the feature but make it more safe for production use. Personally I 
think the changes were added can be used as a good basis for future 
improvements but those should be the subject of further PRs.
- In the PR I provided a short reasoning for the API changes to 
[~exceptionfactory] and I consider it as an ongoing discussion. The same result 
might be achieved without API changes but please find my thinking process 
there. I do not inist for this approach however I see some merits of it.
- The Retry mechanism is actively used by the QuestDB implementation as part of 
error management. It is possible that I was not clear of that in the PR 
summary, but it is not something for a later commit. I separated it from the 
QuestDB related code in order to 1. have a more focused code structure 2. it is 
a possible tool for later usage within the project.

The default selection:
- This topic is already touched on the PR and I am completely okay with moving 
that part out from the commit. In the long term I would see benefits of making 
it a default (the mail thread started with this and I still have the same 
standing) but of course lets be sure about its safeness before moving on

I hope I could give some useful information and clear some fog around the PR. 
Please if you still have conflicted feeling about the approach (and I 
deliberately do not use the phrase "final result", as I see some possible 
furute steps), share it.

> Improving fault tolerancy of the QuestDB backed metrics repository
> ------------------------------------------------------------------
>
>                 Key: NIFI-12236
>                 URL: https://issues.apache.org/jira/browse/NIFI-12236
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Core Framework
>            Reporter: Simon Bence
>            Assignee: Simon Bence
>            Priority: Major
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> Based on the related discussion on the dev email list, the QuestDB handling 
> of the metrics repository needs to be improved to have better fault tolerance 
> in order to be possible to use as a viable option for default metrics data 
> store. This should primarily focus on handling unexpeted database events like 
> corrupted database or loss of space on the disk. Any issues should be handled 
> with an attempt to keep the database service healthy but in case of that is 
> impossible, the priority is to keep NiFi and the core services running, even 
> with the price of metrics collection / presentation outage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to