[ 
https://issues.apache.org/jira/browse/NIFI-1847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15842946#comment-15842946
 ] 

Mark Payne commented on NIFI-1847:
----------------------------------

I have no problem with this proposal - except that the wording "recommend the 
max size be changed to a percentage" - I would not want to *change* how it 
worked but rather give the user the option of choosing one or the other by 
introducing a new property (nifi.provenance.repository.max.storage.size would 
stay but also nifi.provenance.repository.max.storage.percentage would be added).

> improve provenance space utilization
> ------------------------------------
>
>                 Key: NIFI-1847
>                 URL: https://issues.apache.org/jira/browse/NIFI-1847
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Core Framework
>    Affects Versions: 0.5.1
>            Reporter: Ben Icore
>            Assignee: Joe Skora
>
> currently the max storage size of the provenance repo is specified in bytes.  
> this is ok if there is a single provenance repo.  If multple repos are 
> specified, the space can be significantly under utilized.
> consider the following examples
> repo 1 has 500GB of space
> repo 2 has 500GB of space
> max storeage size would likely be set at 900GB, since the combine space is 
> 1TB.  900GB seems like a "safe" value, because provenance informaiton is 
> generally stripped evenly accross the repos, however this is not garanteed.  
> with the max size is considerablly larger than the size of any given 
> partition, any given partition could easily reach 100%
> The only safe way to prevent a given partion in the above example from 
> filling is to set the max size at say 450GB, however this caps the entire 
> provenance repo at 450GB, effectively rendering 650GB of disk space unuseable.
> If the repo sizes where of uneven size, say
> repo 1 has 700GB of space
> repo 2 has 300GB of space
> you would have the same 1TB of provenance space, but this individual repos 
> are uneven, so the 900GB of storage would definately cause repo 2 to run out 
> of disk space.  The only way to ensure that repo 2 did not run out of disk 
> space would be to set the max size to 250GB, effectively loosing 750GB of 
> disk space
> recommend the max size be changed to a percentage and applyed to the 
> individual repos.  provenance records should still be distributed as evenly 
> as possible, but if one repo has exceed its max, information would written to 
> the other
> so in example 1 
> repo 1 has 500GB of space
> repo 2 has 500GB of space
> max space is 90%
> effective and "usable" repo space would be 900GB
> so in example 1 
> repo 1 has 700GB of space
> repo 2 has 300GB of space
> max space is 90%
> effective and "usable" repo space would be 900GB



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to