Ashish Kumar created HDDS-14921:
-----------------------------------

             Summary: Improve space accounting in SCM with In-Flight container 
allocation tracking
                 Key: HDDS-14921
                 URL: https://issues.apache.org/jira/browse/HDDS-14921
             Project: Apache Ozone
          Issue Type: Improvement
            Reporter: Ashish Kumar
            Assignee: Ashish Kumar


The current disk space management and container allocation mechanism in SCM 
(Storage Container Manager) relies heavily on periodic DataNode (DN) heartbeat 
reports and static policies such as pipeline count based on disk numbers. This 
approach introduces multiple systemic challenges:
 # *Stale Space Visibility*
SCM makes allocation decisions based on heartbeat-reported disk space, which 
can lag behind actual usage. During high allocation rates, this delay leads to 
inaccurate space estimation and potential over-allocation.
 # *Burst Allocation Risk*
Rapid container allocations within short intervals are not accounted for 
immediately, allowing multiple allocations against the same reported free 
space. This can oversubscribe disks and result in sudden disk exhaustion.

*Solution:*

Two Window Tumbling Bucket similar like HADOOP-3707

*Two Windows Per DataNode*
Each DataNode has a TwoWindowBucket containing:
- currentWindow: Containers allocated in the current 10-minute interval
- previousWindow: Containers from the previous 10-minute interval
- lastRollTime: Timestamp of last roll

*Container Allocation Flow: New Container Allocated*                            
     
   - Add ContainerID to currentWindow                            
   - Check if roll needed (time > lastRollTime + 10min)          
   - If yes: previousWindow = currentWindow; currentWindow = {}

*Space check: Get Pending Allocations*                                 
   - Roll if needed                                              
   - Return UNION(currentWindow, previousWindow)                 
   - pendingSize = union.size() × maxContainerSize               
   - effectiveSpace = remainingSpace - pendingSize   

 *Container report: Container Report Received*                               
   - Remove ContainerID from BOTH windows                        
   - More accurate than waiting for automatic aging              
   - Falls back to aging if report is delayed/missed   **

*Automatic Aging: Roll*

Every 10 Minutes (Triggered Lazily on Operations):              
   1. previousWindow = currentWindow                              
   2. currentWindow = {} (new empty set)                          
   3. lastRollTime = now                                          
   4. Old previousWindow is garbage collected                     
                                                                  

+*Timeline Example*+

Time  | Action                    | CurrentWindow | PreviousWindow | Total 
Pending
------+---------------------------+---------------+----------------+--------------
00:00 | Allocate Container-1      | \{C1}          | {}             | \{C1}
00:05 | Allocate Container-2      | \{C1, C2}      | {}             | \{C1, C2}
00:08 | Allocate Container-3      | \{C1, C2, C3}  | {}             | \{C1, C2, 
C3}
00:10 | [ROLL] Window tumbles     | {}            | \{C1, C2, C3}   | \{C1, C2, 
C3}
      |  ⤷ previousWindow ← currentWindow
      |  ⤷ currentWindow ← {} (reset)
------+---------------------------+---------------+----------------+--------------
00:12 | Allocate Container-4      | \{C4}          | \{C1, C2, C3}   | \{C1, 
C2, C3, C4}
00:15 | Report confirms C1        | \{C4}          | \{C2, C3}       | \{C2, 
C3, C4}
      |  ⤷ Explicitly removed from previousWindow
00:18 | Allocate Container-5      | \{C4, C5}      | \{C2, C3}       | \{C2, 
C3, C4, C5}
00:20 | [ROLL] Window tumbles     | {}            | \{C4, C5}       | \{C4, C5}
      |  ⤷ C2, C3 aged out (not reported in 20 min)
------+---------------------------+---------------+----------------+--------------
00:25 | Report confirms C4        | {}            | \{C5}           | \{C5}
00:30 | [ROLL] Window tumbles     | {}            | {}             | {}
      |  ⤷ C5 aged out (not reported in 20 min)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to