[
https://issues.apache.org/jira/browse/HDDS-14921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated HDDS-14921:
----------------------------------
Labels: pull-request-available (was: )
> Improve space accounting in SCM with In-Flight container allocation tracking
> ----------------------------------------------------------------------------
>
> Key: HDDS-14921
> URL: https://issues.apache.org/jira/browse/HDDS-14921
> Project: Apache Ozone
> Issue Type: Improvement
> Reporter: Ashish Kumar
> Assignee: Ashish Kumar
> Priority: Major
> Labels: pull-request-available
>
> The current disk space management and container allocation mechanism in SCM
> (Storage Container Manager) relies heavily on periodic DataNode (DN)
> heartbeat reports and static policies such as pipeline count based on disk
> numbers. This approach introduces multiple systemic challenges:
> # *Stale Space Visibility*
> SCM makes allocation decisions based on heartbeat-reported disk space, which
> can lag behind actual usage. During high allocation rates, this delay leads
> to inaccurate space estimation and potential over-allocation.
> # *Burst Allocation Risk*
> Rapid container allocations within short intervals are not accounted for
> immediately, allowing multiple allocations against the same reported free
> space. This can oversubscribe disks and result in sudden disk exhaustion.
> *Solution:*
> Two Window Tumbling Bucket similar like HADOOP-3707
> *Two Windows Per DataNode*
> Each DataNode has a TwoWindowBucket containing:
> - currentWindow: Containers allocated in the current 10-minute interval
> - previousWindow: Containers from the previous 10-minute interval
> - lastRollTime: Timestamp of last roll
> *Container Allocation Flow: New Container Allocated*
>
> - Add ContainerID to currentWindow
> - Check if roll needed (time > lastRollTime + 10min)
> - If yes: previousWindow = currentWindow; currentWindow = {}
> *Space check: Get Pending Allocations*
> - Roll if needed
> - Return UNION(currentWindow, previousWindow)
> - pendingSize = union.size() × maxContainerSize
> - effectiveSpace = remainingSpace - pendingSize
> *Container report: Container Report Received*
> - Remove ContainerID from BOTH windows
> - More accurate than waiting for automatic aging
> - Falls back to aging if report is delayed/missed **
> *Automatic Aging: Roll*
> Every 10 Minutes (Triggered Lazily on Operations):
> 1. previousWindow = currentWindow
> 2. currentWindow = {} (new empty set)
> 3. lastRollTime = now
> 4. Old previousWindow is garbage collected
>
> +*Timeline Example*+
> Time | Action | CurrentWindow | PreviousWindow | Total
> Pending
> ------+---------------------------+---------------+----------------+--------------
> 00:00 | Allocate Container-1 | \{C1} | {} | \{C1}
> 00:05 | Allocate Container-2 | \{C1, C2} | {} | \{C1,
> C2}
> 00:08 | Allocate Container-3 | \{C1, C2, C3} | {} | \{C1,
> C2, C3}
> 00:10 | [ROLL] Window tumbles | {} | \{C1, C2, C3} | \{C1,
> C2, C3}
> | ⤷ previousWindow ← currentWindow
> | ⤷ currentWindow ← {} (reset)
> ------+---------------------------+---------------+----------------+--------------
> 00:12 | Allocate Container-4 | \{C4} | \{C1, C2, C3} | \{C1,
> C2, C3, C4}
> 00:15 | Report confirms C1 | \{C4} | \{C2, C3} | \{C2,
> C3, C4}
> | ⤷ Explicitly removed from previousWindow
> 00:18 | Allocate Container-5 | \{C4, C5} | \{C2, C3} | \{C2,
> C3, C4, C5}
> 00:20 | [ROLL] Window tumbles | {} | \{C4, C5} | \{C4,
> C5}
> | ⤷ C2, C3 aged out (not reported in 20 min)
> ------+---------------------------+---------------+----------------+--------------
> 00:25 | Report confirms C4 | {} | \{C5} | \{C5}
> 00:30 | [ROLL] Window tumbles | {} | {} | {}
> | ⤷ C5 aged out (not reported in 20 min)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]