Thanks for the feedback from everyone!

As I understand the intention is supported and with some preparation (covering 
the cases mentioned) it can be done. I will raise some PR in the foreseeable 
future to target these questions.

Regards,
Bence

> On 2023. Jul 19., at 16:01, David Handermann <exceptionfact...@apache.org> 
> wrote:
> 
> Thanks for the suggestion and additional background Bence, that is very
> helpful in evaluating the default inclusion approach.
> 
> I agree with Joe's concern about handling potential corruption. We have
> recently reduced dependency on the H2 file-backed database driver so that
> it is now limited to Flow Configuration History. Based on experience there,
> NiFi can fail to start when the database file is corrupted, which is not
> ideal. We should look into improving that behavior, allowing NiFi to start
> and saving off the corrupted file instead of failing to start. If we go
> forward with QuestDB as the default strategy for status history, we should
> build in the resilient approach as a prerequisite to enabling it in the
> default configuration.
> 
> Regards,
> David Handermann
> 
> On Wed, Jul 19, 2023 at 8:31 AM Simon Bence <simonbence....@gmail.com>
> wrote:
> 
>> Thanks for the quick feedback!
>> 
>> Joe: your concerns are relevant, let me provide some details:
>> 
>> The database uses some disk space, determined by the number of components
>> and the number of covered days. During adding it I was checking for time
>> usage and however I don’t have the numbers any more, the usage seemed
>> reasonable. I can do a bit of testing and bring some numbers to improve
>> confidence with it. Additionally the necessary disk space is limited: we
>> have rollover handling capability, which limits the amount of stored data,
>> to the target number plus one days. This is due to the limitations of
>> QuestDB with partitioning: at the time of development the smallest
>> partition strategy way day based if I remember correctly so the unit of
>> deletion was the partition just shifted out from the threshold. (Now it
>> looks to be the hour based partitoning which might worth the effort to
>> upgrade to)
>> 
>> The current rollover deletes all the data older than the threshold, but I
>> am thinking on adding a new implementation which keeps some aggregated
>> information about the components. That of course needs some more space,
>> again: depending on the number of components and the time.
>> 
>> In case the disk is full, we have no way to push down metrics to the
>> database and currently there is no fallback strategy for it. A possible way
>> would be to temporarily keep the data in memory (similar to the
>> VolatileComponentStatusRepository in that regard) but I am not convinced
>> that if a node with resources close to the limitations it would be
>> necessarily a good strategy to write data into the memory instead of the
>> disk. This is something to consider.
>> 
>> If the database becomes corrupted than we loose the status information.
>> This I think is true for most of the persisted storage however I would
>> think if the database files are not changed by using external tools there
>> is an insignificant chance for this. Fallback strategies might be added
>> (like if NiFi considers the database corrupted, it might start a new one)
>> but even without this I think the QuestDB based solution has its merits
>> compared to the in memory storage.
>> 
>> Manual intervention should not be needed. Currently in order to use this
>> capability, the configuration must be changed but if we would make this the
>> default, it should work without any additional interaction.
>> 
>> Regards,
>> Bence
>> 
>>> On 2023. Jul 19., at 14:57, Joe Witt <joe.w...@gmail.com> wrote:
>>> 
>>> Agree functionally
>>> 
>>> How does this handle disk usage?   Any manual intervention needed?  What
>> if
>>> the disk is full where it writes?  What if the db somehow becomes
>>> corrupted?
>>> 
>>> Id like to ensure this thing is zero ops as much as possible such that in
>>> error conditions it resets and gets going again.
>>> 
>>> Thanks
>>> 
>>> On Wed, Jul 19, 2023 at 8:55 AM Pierre Villard <
>> pierre.villard...@gmail.com>
>>> wrote:
>>> 
>>>> I do think this provides great value. The possibility to get access to
>>>> status history of the components and at system level across restart is a
>>>> great improvement for NiFi troubleshooting. It also gives the ability to
>>>> store this information for a longer period of time. I'm definitely in
>> favor
>>>> of making this the default starting with NiFi 2.0.
>>>> 
>>>> Le mer. 19 juil. 2023 à 13:49, Simon Bence <simonbence....@gmail.com> a
>>>> écrit :
>>>> 
>>>>> Hi Community,
>>>>> 
>>>>> I was thinking if it would make sense to set the QuestDB as default for
>>>>> status history backend in 2.0? It is there for a while and I would
>>>> consider
>>>>> it as a step forward so the new major version might be a good time for
>>>> the
>>>>> wider audience. It comes with less memory usage for bigger flows, the
>>>>> possibility of checking status information when the node is not running
>>>> or
>>>>> restarted so I think it worth consideration. Any insight or improvement
>>>>> point is appreciated, thanks!
>>>>> 
>>>>> Regards,
>>>>> Bence
>>>> 
>> 
>> 

Reply via email to