Ben Icore created NIFI-1847:
-------------------------------

             Summary: improve provenance space utilization
                 Key: NIFI-1847
                 URL: https://issues.apache.org/jira/browse/NIFI-1847
             Project: Apache NiFi
          Issue Type: Improvement
          Components: Core Framework
    Affects Versions: 0.5.1
            Reporter: Ben Icore


currently the max storage size of the provenance repo is specified in bytes.  
this is ok if there is a single provenance repo.  If multple repos are 
specified, the space can be significantly under utilized.

consider the following examples

repo 1 has 500GB of space
repo 2 has 500GB of space

max storeage size would likely be set at 900GB, since the combine space is 1TB. 
 900GB seems like a "safe" value, because provenance informaiton is generally 
stripped evenly accross the repos, however this is not garanteed.  with the max 
size is considerablly larger than the size of any given partition, any given 
partition could easily reach 100%

The only safe way to prevent a given partion in the above example from filling 
is to set the max size at say 450GB, however this caps the entire provenance 
repo at 450GB, effectively rendering 650GB of disk space unuseable.

If the repo sizes where of uneven size, say

repo 1 has 700GB of space
repo 2 has 300GB of space

you would have the same 1TB of provenance space, but this individual repos are 
uneven, so the 900GB of storage would definately cause repo 2 to run out of 
disk space.  The only way to ensure that repo 2 did not run out of disk space 
would be to set the max size to 250GB, effectively loosing 750GB of disk space

recommend the max size be changed to a percentage and applyed to the individual 
repos.  provenance records should still be distributed as evenly as possible, 
but if one repo has exceed its max, information would written to the other

so in example 1 
repo 1 has 500GB of space
repo 2 has 500GB of space
max space is 90%
effective and "usable" repo space would be 900GB

so in example 1 
repo 1 has 700GB of space
repo 2 has 300GB of space
max space is 90%
effective and "usable" repo space would be 900GB



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to