[
https://issues.apache.org/jira/browse/HDDS-6162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sammi Chen updated HDDS-6162:
-----------------------------
Summary: limit OM DoubleBuffer pending request to avoid taking to much
memeory (was: limit OM pending request size to avoid taking to much memeory )
> limit OM DoubleBuffer pending request to avoid taking to much memeory
> ----------------------------------------------------------------------
>
> Key: HDDS-6162
> URL: https://issues.apache.org/jira/browse/HDDS-6162
> Project: Apache Ozone
> Issue Type: Bug
> Reporter: Jie Yao
> Assignee: Jie Yao
> Priority: Major
> Labels: pull-request-available
>
> now , if OM HA is enabled , when a client request arrives at OM leader, the
> request will be written to ratis log and replicated to the other two
> followers.
> how the request is handled by om is as follows:
> 1 statemachineUpdater(ratis) will apply each log to omStatemachine by calling
> statemachine#applyThransaction
> 2 in `applyTransaction` , a request-handing function will be wrapped into
> `runcommand` and be submitted to the single-thread threadpool.
> 3 `runcommand` will be put into a *unlimited blocking queue* of the thread
> pool , and the single thread in the pool will take the task from the queue
> and execute it one by one.
> 4 when executing the task, the request will be handled and put to the
> omDubbleBuffer`s currentBuffer, which is a *unlimited blocking queue.*
> 5 if the currentBuffer is not empty , omDubbleBuffer will swap currentBuffer
> and readybuffer.
> 6 an Asynchronous flush thread will put all the requests in the readybuffer
> to a rocksdb batch, and then commit the batch to the db.
>
> so there may be a problem. if there are a large number of requests, but the
> commit option is time-consuming, then more and more request will be put into
> the blocking queue of the thread pool , or the currentBuffer of the
> omDubbleBuffer, and this will consume memory very much. in our cluster ,when
> we use cosbench to do stress test, the two queue is very large and leading
> to a very long full GC( about five minutes), but reclaiming very little space
> after GC. what`s more, this will lead to the heartbeat timeout and reelection
> of ratis. so this will make om not available.
> so the idea here is that , we need to limit the size of the blocking queue
> above, and make the size configurable. when the max size is hit, the client
> request should be blocked.
> by the way, raft is strongly sequential, so every raft request must be
> handled sequentially, even if they are independent. so maybe we could
> refactor the current implementation of omStatemachine, maybe like scm
> statemachine.
>
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]