[
https://issues.apache.org/jira/browse/NIFI-1847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15842946#comment-15842946
]
Mark Payne commented on NIFI-1847:
----------------------------------
I have no problem with this proposal - except that the wording "recommend the
max size be changed to a percentage" - I would not want to *change* how it
worked but rather give the user the option of choosing one or the other by
introducing a new property (nifi.provenance.repository.max.storage.size would
stay but also nifi.provenance.repository.max.storage.percentage would be added).
> improve provenance space utilization
> ------------------------------------
>
> Key: NIFI-1847
> URL: https://issues.apache.org/jira/browse/NIFI-1847
> Project: Apache NiFi
> Issue Type: Improvement
> Components: Core Framework
> Affects Versions: 0.5.1
> Reporter: Ben Icore
> Assignee: Joe Skora
>
> currently the max storage size of the provenance repo is specified in bytes.
> this is ok if there is a single provenance repo. If multple repos are
> specified, the space can be significantly under utilized.
> consider the following examples
> repo 1 has 500GB of space
> repo 2 has 500GB of space
> max storeage size would likely be set at 900GB, since the combine space is
> 1TB. 900GB seems like a "safe" value, because provenance informaiton is
> generally stripped evenly accross the repos, however this is not garanteed.
> with the max size is considerablly larger than the size of any given
> partition, any given partition could easily reach 100%
> The only safe way to prevent a given partion in the above example from
> filling is to set the max size at say 450GB, however this caps the entire
> provenance repo at 450GB, effectively rendering 650GB of disk space unuseable.
> If the repo sizes where of uneven size, say
> repo 1 has 700GB of space
> repo 2 has 300GB of space
> you would have the same 1TB of provenance space, but this individual repos
> are uneven, so the 900GB of storage would definately cause repo 2 to run out
> of disk space. The only way to ensure that repo 2 did not run out of disk
> space would be to set the max size to 250GB, effectively loosing 750GB of
> disk space
> recommend the max size be changed to a percentage and applyed to the
> individual repos. provenance records should still be distributed as evenly
> as possible, but if one repo has exceed its max, information would written to
> the other
> so in example 1
> repo 1 has 500GB of space
> repo 2 has 500GB of space
> max space is 90%
> effective and "usable" repo space would be 900GB
> so in example 1
> repo 1 has 700GB of space
> repo 2 has 300GB of space
> max space is 90%
> effective and "usable" repo space would be 900GB
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)