[ 
https://issues.apache.org/jira/browse/HDDS-14921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-14921:
----------------------------------
    Labels: pull-request-available  (was: )

> Improve space accounting in SCM with In-Flight container allocation tracking
> ----------------------------------------------------------------------------
>
>                 Key: HDDS-14921
>                 URL: https://issues.apache.org/jira/browse/HDDS-14921
>             Project: Apache Ozone
>          Issue Type: Improvement
>            Reporter: Ashish Kumar
>            Assignee: Ashish Kumar
>            Priority: Major
>              Labels: pull-request-available
>
> The current disk space management and container allocation mechanism in SCM 
> (Storage Container Manager) relies heavily on periodic DataNode (DN) 
> heartbeat reports and static policies such as pipeline count based on disk 
> numbers. This approach introduces multiple systemic challenges:
>  # *Stale Space Visibility*
> SCM makes allocation decisions based on heartbeat-reported disk space, which 
> can lag behind actual usage. During high allocation rates, this delay leads 
> to inaccurate space estimation and potential over-allocation.
>  # *Burst Allocation Risk*
> Rapid container allocations within short intervals are not accounted for 
> immediately, allowing multiple allocations against the same reported free 
> space. This can oversubscribe disks and result in sudden disk exhaustion.
> *Solution:*
> Two Window Tumbling Bucket similar like HADOOP-3707
> *Two Windows Per DataNode*
> Each DataNode has a TwoWindowBucket containing:
> - currentWindow: Containers allocated in the current 10-minute interval
> - previousWindow: Containers from the previous 10-minute interval
> - lastRollTime: Timestamp of last roll
> *Container Allocation Flow: New Container Allocated*                          
>        
>    - Add ContainerID to currentWindow                            
>    - Check if roll needed (time > lastRollTime + 10min)          
>    - If yes: previousWindow = currentWindow; currentWindow = {}
> *Space check: Get Pending Allocations*                                 
>    - Roll if needed                                              
>    - Return UNION(currentWindow, previousWindow)                 
>    - pendingSize = union.size() × maxContainerSize               
>    - effectiveSpace = remainingSpace - pendingSize   
>  *Container report: Container Report Received*                               
>    - Remove ContainerID from BOTH windows                        
>    - More accurate than waiting for automatic aging              
>    - Falls back to aging if report is delayed/missed   **
> *Automatic Aging: Roll*
> Every 10 Minutes (Triggered Lazily on Operations):              
>    1. previousWindow = currentWindow                              
>    2. currentWindow = {} (new empty set)                          
>    3. lastRollTime = now                                          
>    4. Old previousWindow is garbage collected                     
>                                                                   
> +*Timeline Example*+
> Time  | Action                    | CurrentWindow | PreviousWindow | Total 
> Pending
> ------+---------------------------+---------------+----------------+--------------
> 00:00 | Allocate Container-1      | \{C1}          | {}             | \{C1}
> 00:05 | Allocate Container-2      | \{C1, C2}      | {}             | \{C1, 
> C2}
> 00:08 | Allocate Container-3      | \{C1, C2, C3}  | {}             | \{C1, 
> C2, C3}
> 00:10 | [ROLL] Window tumbles     | {}            | \{C1, C2, C3}   | \{C1, 
> C2, C3}
>       |  ⤷ previousWindow ← currentWindow
>       |  ⤷ currentWindow ← {} (reset)
> ------+---------------------------+---------------+----------------+--------------
> 00:12 | Allocate Container-4      | \{C4}          | \{C1, C2, C3}   | \{C1, 
> C2, C3, C4}
> 00:15 | Report confirms C1        | \{C4}          | \{C2, C3}       | \{C2, 
> C3, C4}
>       |  ⤷ Explicitly removed from previousWindow
> 00:18 | Allocate Container-5      | \{C4, C5}      | \{C2, C3}       | \{C2, 
> C3, C4, C5}
> 00:20 | [ROLL] Window tumbles     | {}            | \{C4, C5}       | \{C4, 
> C5}
>       |  ⤷ C2, C3 aged out (not reported in 20 min)
> ------+---------------------------+---------------+----------------+--------------
> 00:25 | Report confirms C4        | {}            | \{C5}           | \{C5}
> 00:30 | [ROLL] Window tumbles     | {}            | {}             | {}
>       |  ⤷ C5 aged out (not reported in 20 min)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to