[ 
https://issues.apache.org/jira/browse/HDDS-6162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sammi Chen updated HDDS-6162:
-----------------------------
    Summary: limit OM DoubleBuffer pending request to avoid taking to much 
memeory   (was: limit OM pending request size to avoid taking to much memeory )

> limit OM DoubleBuffer pending request to avoid taking to much memeory 
> ----------------------------------------------------------------------
>
>                 Key: HDDS-6162
>                 URL: https://issues.apache.org/jira/browse/HDDS-6162
>             Project: Apache Ozone
>          Issue Type: Bug
>            Reporter: Jie Yao
>            Assignee: Jie Yao
>            Priority: Major
>              Labels: pull-request-available
>
> now , if OM HA is enabled , when a client request arrives at OM leader, the 
> request will be written to ratis log and replicated to the other two 
> followers. 
> how the request is handled by om is as follows:
> 1 statemachineUpdater(ratis) will apply each log to omStatemachine by calling 
> statemachine#applyThransaction
> 2 in `applyTransaction` , a request-handing function will be wrapped into 
> `runcommand` and be submitted to the single-thread threadpool.
> 3 `runcommand` will be put into a *unlimited blocking queue* of the thread 
> pool , and the single thread in the pool will take the task from the queue 
> and execute it one by one.
> 4 when executing the task, the request will be handled and put to the 
> omDubbleBuffer`s currentBuffer, which is a *unlimited blocking queue.*
> 5 if the currentBuffer is not empty , omDubbleBuffer will swap currentBuffer 
> and readybuffer.
> 6 an Asynchronous flush thread will put all the requests in the readybuffer 
> to a rocksdb batch, and then commit the batch to the db.
>  
> so there may be a problem.  if there are a large number of requests, but the 
> commit option is time-consuming, then more and more request will be put into 
> the blocking queue of the thread pool , or the currentBuffer of the 
> omDubbleBuffer, and this will consume memory very much. in our cluster ,when 
> we use cosbench to do stress test,  the two queue is very large and leading 
> to a very long full GC( about five minutes), but reclaiming very little space 
> after GC. what`s more, this will lead to the heartbeat timeout and reelection 
> of ratis.  so this will make om not available.
> so the idea here is that , we need to limit the size of the blocking queue 
> above, and make the size configurable. when the max size is hit, the client 
> request should be blocked.
> by the way, raft is strongly sequential, so every raft request must be 
> handled sequentially, even if they are independent. so maybe we could 
> refactor the current implementation of omStatemachine, maybe like scm 
> statemachine.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to