[ 
https://issues.apache.org/jira/browse/HDDS-11897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ritesh Shukla updated HDDS-11897:
---------------------------------
    Summary: Migrating Ozone Manager replication from post Ratis execution to 
Pre Ratis execution.  (was: This tracks the work needed to change request 
processing execution order for performance improvements and fixing certain edge 
cases for correctness.)

> Migrating Ozone Manager replication from post Ratis execution to Pre Ratis 
> execution.
> -------------------------------------------------------------------------------------
>
>                 Key: HDDS-11897
>                 URL: https://issues.apache.org/jira/browse/HDDS-11897
>             Project: Apache Ozone
>          Issue Type: Epic
>          Components: Ozone Manager
>            Reporter: Ritesh Shukla
>            Priority: Major
>              Labels: ozone-performance
>
> The following challenges and solutions are proposed as part of this epic.
>  # The current implementation depends on consensus on the order of requests 
> received rather than on consensus on the processing of the requests.
>  ## This can lead to subtle bugs due to discrepancies in the actual execution 
> of requests on the leader vs the followers.
>  # The double buffer implementation is currently meant to optimize the rate 
> at which writes get flushed to RocksDB, but the effective batching achieved 
> is 1.2 at best. It is also a source of continuous bugs and added complexity 
> for new features.
>  ## The new implementation will not depend on the double buffer behavior.
>  # The number of transactions that can be pushed through Ratis currently 
> averages around 25k.
>  ## Requests will be batched before sending them to Ratis for consensus.
>  # Readers and writers are not separated, and there is potential contention 
> between readers and writers.
>  # Although FSO and OBS bucket types can have finer-grained locking, 
> coarse-grained locks are held at the Bucket level. 
>  ## The new implementation will introduce locking at the start of the request 
> processing to serialize requests that must be linearized against each other. 
> These changes and related changes together should result
>  # Significant performance improvement in the rate of request processing (3x)
>  # Better code quality and test coverage
>  # Elimination of subtle bugs arising from the write-back cache design of 
> double buffer writes post Ratis 
>  # Fine grained locking such that requests that can be processed in parallel 
> are run without locking.
>  # Separation of resources for readers and writers. This will also help 
> process reads from followers using Ratis' capabilities for linearized reads 
> from followers.  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to