Ben Icore created NIFI-1847:
-------------------------------
Summary: improve provenance space utilization
Key: NIFI-1847
URL: https://issues.apache.org/jira/browse/NIFI-1847
Project: Apache NiFi
Issue Type: Improvement
Components: Core Framework
Affects Versions: 0.5.1
Reporter: Ben Icore
currently the max storage size of the provenance repo is specified in bytes.
this is ok if there is a single provenance repo. If multple repos are
specified, the space can be significantly under utilized.
consider the following examples
repo 1 has 500GB of space
repo 2 has 500GB of space
max storeage size would likely be set at 900GB, since the combine space is 1TB.
900GB seems like a "safe" value, because provenance informaiton is generally
stripped evenly accross the repos, however this is not garanteed. with the max
size is considerablly larger than the size of any given partition, any given
partition could easily reach 100%
The only safe way to prevent a given partion in the above example from filling
is to set the max size at say 450GB, however this caps the entire provenance
repo at 450GB, effectively rendering 650GB of disk space unuseable.
If the repo sizes where of uneven size, say
repo 1 has 700GB of space
repo 2 has 300GB of space
you would have the same 1TB of provenance space, but this individual repos are
uneven, so the 900GB of storage would definately cause repo 2 to run out of
disk space. The only way to ensure that repo 2 did not run out of disk space
would be to set the max size to 250GB, effectively loosing 750GB of disk space
recommend the max size be changed to a percentage and applyed to the individual
repos. provenance records should still be distributed as evenly as possible,
but if one repo has exceed its max, information would written to the other
so in example 1
repo 1 has 500GB of space
repo 2 has 500GB of space
max space is 90%
effective and "usable" repo space would be 900GB
so in example 1
repo 1 has 700GB of space
repo 2 has 300GB of space
max space is 90%
effective and "usable" repo space would be 900GB
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)