[ 
https://issues.apache.org/jira/browse/NIFI-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15684567#comment-15684567
 ] 

Mark Payne commented on NIFI-3039:
----------------------------------

[~jskora] - Thanks for digging into this guy... the code can get pretty intense 
here :)

The ramifications of this change do concern me a bit, though. I can see why you 
may get smoother performance when you start aging off aggressively at 90%. 
Given the code change and the difference that it makes, though, my guess is 
that the "thrashing" that you mention is due to large spikes of data, rather 
than data that is processed at a very consistent data rate. As a result, this 
change would give smoother performance as long as that spike didn't last "too 
long." I say this because if you hit 90% and then you see it continue to rise 
up to 100% usage, that means that the deletions were not keeping up. As a 
result, this change would give the system more time, allowing for 30% of the 
repo size to be filled up while the deletions were falling behind instead of 
the existing 10%. We are doing this, though, at the expense of giving up a 
pretty large amount of storage. Consider, for instance, if I have a 1 TB 
provenance repo. This means that once I hit 900 GB usage I'll not stop deleting 
until I'm down to only 700 GB used -- I'd be giving up around 300 GB of 
provenance data.

I would just be very hesitant to start trading storage for potential 
performance consistency when it's not necessary in all cases. If this change is 
needed, perhaps we should make these values configurable, defaulting to 
something 90% as the high-water mark and 87% as the low-water mark? This would 
leave it fairly consistent with how it operates now but provides the 
flexibility needed for the different cases.

re: rollover cutoffs, the idea there was that we hope that setting the ageoff 
to 90% will result in not exceeding the max. But in a 'last ditch effort' if 
cannot keep up, so that we end up significantly exceeding the max configured 
value (110% of this value, as you noted), we will block to avoid writing to the 
repo at all until ageoff occurs. I'm not opposed to changing that to 100% as 
you outline above. However, the PR seems to change it to 90%. I don't think we 
should block writing to the repo just because we hit 90% utilization.

Thoughts here?



> Provenance Repository - Fix PurgeOldEvent and Rollover Size Limits
> ------------------------------------------------------------------
>
>                 Key: NIFI-3039
>                 URL: https://issues.apache.org/jira/browse/NIFI-3039
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework
>    Affects Versions: 1.0.0, 1.1.0, 0.8.0, 0.7.1
>            Reporter: Joe Skora
>            Assignee: Joe Skora
>
> Current {purgeOldEvents} logic triggers cleanup when 90% of space is used, 
> but it only removes one file if usage is under 100%, causing thrashing around 
> 100% usage.  In testing, cleanup up to 70% after hitting 90% makes the system 
> run more smoothly.
> Also, {rollover} will not trigger cleanup unless 110% of the allowed space is 
> in use, changing this to 100% also make a difference in testing.
> Before these changes, a test system that generates huge amounts of provenance 
> would become unstable and stop processing provenance until restarted.  With 
> these changes, the system consistently recovers even under heavy load.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to