[ 
https://issues.apache.org/jira/browse/NIFI-15570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Handermann resolved NIFI-15570.
-------------------------------------
    Fix Version/s: 2.9.0
       Resolution: Fixed

> Partial defragmentation of Content Repository via tail-claim truncation
> -----------------------------------------------------------------------
>
>                 Key: NIFI-15570
>                 URL: https://issues.apache.org/jira/browse/NIFI-15570
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Core Framework
>            Reporter: Mark Payne
>            Assignee: Mark Payne
>            Priority: Major
>             Fix For: 2.9.0
>
>          Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> h3. Problem
> NiFi's FileSystemRepository uses a slab-allocation strategy for storing 
> FlowFile content: multiple FlowFiles are written sequentially into a single 
> ResourceClaim file on disk. This is efficient because it avoids the overhead 
> of creating and deleting huge numbers of small files. However, it introduces 
> a fragmentation problem.When any FlowFile still references a ResourceClaim, 
> the entire file must be kept on disk — even if the vast majority of its bytes 
> belong to FlowFiles that have already been removed. Consider a ResourceClaim 
> that contains five ContentClaims of sizes 1 KB, 2 KB, 4 KB, 3 KB, and 1 GB. 
> If only the 1 KB FlowFile remains, the full ~1 GB file stays on disk. At 
> scale, this leads to disk exhaustion.A full defragmentation (rewriting live 
> claims into new ResourceClaim files, updating all references, and deleting 
> the originals) would be extremely complex and expensive. But it turns out we 
> can solve the vast majority of the problem without it.
> h3. Key Insight
> With NiFi's slab allocation, there are three possible positions for a large 
> ContentClaim within a ResourceClaim:
>  
>  
> {{<> = Small FlowFile}}
> [................] = Large FlowFile
>  
> 1. Beginning:   [................]<><><><><><><>
>  
> 2. Middle:      <><><><>[................]<><><>
>  
> 3. End:         <><><><><><><><>[................]
>  
> NiFi already prevents cases 1 and 2. The 
> nifi.content.claim.max.appendable.size property (default: 50 KB) causes the 
> repository to stop appending to a ResourceClaim once it exceeds that 
> threshold. Since a "large" ContentClaim is by definition larger than this 
> threshold, the act of writing it will push the ResourceClaim past the (soft) 
> limit, causing the ResourceClaim to be closed for further appending. No 
> additional ContentClaims can be written after the large one.This means a 
> large ContentClaim can only ever appear at the tail of a ResourceClaim. And 
> truncating a file from the tail requires no data movement — it is a single 
> FileChannel.truncate() call.
> h3. Solution
> This change implements "partial defragmentation" by truncating ResourceClaim 
> files from the tail when the last (large) ContentClaim is removed. The 
> approach consists of several coordinated components:Marking truncation 
> candidates at write time — When a ContentClaim is closed in 
> FileSystemRepository, the repository checks whether it is both (a) large 
> (exceeding a threshold) and (b) at a non-zero offset (i.e., not the only 
> claim in the file). If both conditions hold, the claim is flagged as a 
> truncation candidate via StandardContentClaim.setTruncationCandidate(true). 
> If the claim is later cloned (claimant count incremented), the flag is 
> cleared, since truncation is only safe when the claim has a single 
> owner.Routing truncatable claims through the FlowFile Repository — When a 
> FlowFile is deleted or its content is replaced, 
> WriteAheadFlowFileRepository.updateContentClaims() checks if the released 
> ContentClaim is a truncation candidate. If so (and the ResourceClaim itself 
> is not already fully destructable), the claim is queued in 
> claimsAwaitingTruncation. On the next WAL checkpoint or sync, these claims 
> are drained to ResourceClaimManager.markTruncatable().Background truncation 
> in FileSystemRepository — A scheduled TruncateClaims task periodically drains 
> truncatable claims from the ResourceClaimManager. Before truncating, it 
> checks whether truncation is active for the claim's container (archive must 
> be cleared on the last cleanup run and disk usage must exceed the configured 
> threshold). If conditions are met, the file is truncated to the claim's 
> offset via FileChannel.truncate(). If conditions are not met, the claims are 
> saved in a TruncationClaimManager and retried on subsequent runs, ensuring no 
> truncation opportunity is lost.Recovery — On restart, 
> WriteAheadFlowFileRepository.restoreFlowFiles() re-derives truncation 
> eligibility by scanning all recovered FlowFiles, identifying large claims at 
> non-zero offsets that are at the tail of their ResourceClaim and are not 
> shared by multiple FlowFiles.
> h3. Example
> Before truncation — a 1 GB FlowFile was removed but the ResourceClaim 
> persists because small FlowFiles still reference it:
>  
>  
>  
> {{ResourceClaim file (1,000,010 KB on disk):}}{{  [1 KB] [2 KB] [4 KB] [3 KB] 
> [1,000,000 KB (removed)]}}{{After truncation — the file is truncated at the 
> offset where the large claim began:}}
>  
> {{ResourceClaim file (10 KB on disk):}}{{  [1 KB] [2 KB] [4 KB] [3 KB]}}
>  
> The small FlowFiles remain fully readable. The 1 GB of wasted space is 
> reclaimed instantly with a single syscall.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to