[ 
https://issues.apache.org/jira/browse/HDDS-2542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Elek updated HDDS-2542:
------------------------------
    Description: 
The write payload (the chunk itself) is sent to the Ratis as an external, 
binary byte array. It's not part of the LogEntry and saved from an async thread 
with calling ContainerStateMachine.writeStateMachineData

 

As it's an async thread it's possible that the stateMachineData is not yet 
written when the data should be sent to the followers in the next heartbeat.

By design a cache is used to avoid this issue but there are multiple problems 
with the cache.

First, the current cache size is chunkExecutor.getCorePoolSize() which is not 
enough. By default it means 60 executor threads and a cache with size 60. But 
in case of one very slow and 59 very fast writer the cache entries can be 
invalidated before the write.

In my tests (freon datanode-chunk-writer-generator) I have seen missed cache 
hits even with cache size 5000.

Second: as the readStateMachineData and writeStateMachien data are called from 
two different thread there is a race condition independent from the the cache 
size. It's possible that the write thread has not yet added the data to the 
cache but the read thread needs it.

  was:
The write payload (the chunk itself) is sent to the Ratis as an external, 
binary byte array. It's not part of the LogEntry and saved from an async thread 
with calling ContainerStateMachine.writeStateMachineData

 

As it's an async thread it's possible that the stateMachineData is not yet 
written when the data should be sent to the followers in the next heartbeat.

By design a cache is used to avoid this issue but there are multiple problems 
with the cache.

First, the current cache size is chunkExecutor.getCorePoolSize() which is not 
enough. By default it means 10 executor thread and a cache with size 10. But in 
case of one very slow and nine very fast writer the cache entries can be 
invalidated before the write.

In my tests (freon datanode-chunk-writer-generator) I have seen missed cache 
hits even with cache size 5000.

Second: as the readStateMachineData and writeStateMachien data are called from 
two different thread there is a race condition independent from the the cache 
size. It's possible that the write thread has not yet added the data to the 
cache but the read thread needs it.


> Race condition between read and write stateMachineData
> ------------------------------------------------------
>
>                 Key: HDDS-2542
>                 URL: https://issues.apache.org/jira/browse/HDDS-2542
>             Project: Hadoop Distributed Data Store
>          Issue Type: Bug
>          Components: Ozone Datanode
>            Reporter: Marton Elek
>            Priority: Critical
>
> The write payload (the chunk itself) is sent to the Ratis as an external, 
> binary byte array. It's not part of the LogEntry and saved from an async 
> thread with calling ContainerStateMachine.writeStateMachineData
>  
> As it's an async thread it's possible that the stateMachineData is not yet 
> written when the data should be sent to the followers in the next heartbeat.
> By design a cache is used to avoid this issue but there are multiple problems 
> with the cache.
> First, the current cache size is chunkExecutor.getCorePoolSize() which is not 
> enough. By default it means 60 executor threads and a cache with size 60. But 
> in case of one very slow and 59 very fast writer the cache entries can be 
> invalidated before the write.
> In my tests (freon datanode-chunk-writer-generator) I have seen missed cache 
> hits even with cache size 5000.
> Second: as the readStateMachineData and writeStateMachien data are called 
> from two different thread there is a race condition independent from the the 
> cache size. It's possible that the write thread has not yet added the data to 
> the cache but the read thread needs it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to