[ 
https://issues.apache.org/jira/browse/HDDS-7906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sadanand Shenoy updated HDDS-7906:
----------------------------------
    Description: 
For most OM requests like create, delete , rename etc, the flow in HA is as 
follows

-> Create OmRequest -> PreExecute -> validateAndUpdateCache -> finally update 
the OM Rocksdb.

validateAndUpdateCache basically takes a bucket lock for write operations and 
updates the Table cache for the respective table on which the op is run on.

Apart from updating the cache it also adds the OMResponse to the DoubleBuffer 
which is operated on by an always running daemon thread that calls 
OzoneManagerDoubleBuffer#flush every time a new entry is added to the buffer. 
However OM doesn't wait for the op to be flushed to return the response to the 
client instead it sends the response as soon as the cache And Double buffer is 
updated, and bucket lock is not taken during actual put to DB.

This addition to cache is useful as it protects subsequent reads from accessing 
the DB where the flush might not have happened.

Eg

*Rename Key*

t1-> Update renamed Key in Cache and add response to DoubleBuffer and response 
is returned and bucket lock released but RocksDB is not yet updated with 
renamed value.

t2 -> Client issues read on the renamed path, During read it  first check cache 
and sees renamed value.

t3-> Updates the renamed value to actual DB and cleans up cache.

 

However for *CreateSnapshot*

t1-> Update snapshot info in Cache and return response to the client after 
bucket lock released.

t2-> Client issues read on the renamed path, During read it finds the Snapshot 
info object  and the checkpoint dir location from the Snapshot Table Cache, 
however the checkpoint is actually created when actual RocksDB is updated in 
OMCreateSnapshotResponse#addToDBBatch and the read fails

t3-> Add snapshotInfo to DB and create checkpoint.

 

One of the fixes here I can think of is to wait for the checkpoint dir creation 
during read if the snapshot info is in cache.

Another solution would also be to create checkpoint in validateAndUpdateCache 
itself

  was:
For most OM requests like create, delete , rename etc, the flow in HA is as 
follows

-> Create OmRequest -> PreExecute -> validateAndUpdateCache -> finally update 
the OM Rocksdb.

validateAndUpdateCache basically takes a bucket lock for write operations and 
updates the Table cache for the respective table on which the op is run on.

Apart from updating the cache it also adds the OMResponse to the DoubleBuffer 
which is operated on by an always running daemon thread that calls 
OzoneManagerDoubleBuffer#flush every time a new entry is added to the buffer. 
However OM doesn't wait for the op to be flushed to return the response to the 
client instead it sends the response as soon as the cache And Double buffer is 
updated, and bucket lock is not taken during actual put to DB.

This addition to cache is useful as it protects subsequent reads from accessing 
the DB where the flush might not have happened.

Eg

*Rename Key*

t1-> Update renamed Key in Cache and add response to DoubleBuffer and response 
is returned and bucket lock released but RocksDB is not yet updated with 
renamed value.

t2 -> Client issues read on the renamed path, During read it  first check cache 
and sees renamed value.

t3-> Updates the renamed value to actual DB and cleans up cache.

 

However for *CreateSnapshot*

t1-> Update snapshot info in Cache and return response to the client after 
bucket lock released.

t2-> Client issues read on the renamed path, During read it finds the Snapshot 
info object and the checkpoint dir location, however the checkpoint is actually 
created when actual RocksDB is updated in OMCreateSnapshotResponse#addToDBBatch 
and the read fails

t3-> Add snapshotInfo to DB and create checkpoint.

 

One of the fixes here I can think of is to wait for the checkpoint dir creation 
during read if the snapshot info is in cache.

Another solution would also be to create checkpoint in validateAndUpdateCache 
itself


> Wait for checkpoint creation if snapshot in cache and not committed to DB.
> --------------------------------------------------------------------------
>
>                 Key: HDDS-7906
>                 URL: https://issues.apache.org/jira/browse/HDDS-7906
>             Project: Apache Ozone
>          Issue Type: Bug
>            Reporter: Sadanand Shenoy
>            Assignee: Sadanand Shenoy
>            Priority: Major
>
> For most OM requests like create, delete , rename etc, the flow in HA is as 
> follows
> -> Create OmRequest -> PreExecute -> validateAndUpdateCache -> finally update 
> the OM Rocksdb.
> validateAndUpdateCache basically takes a bucket lock for write operations and 
> updates the Table cache for the respective table on which the op is run on.
> Apart from updating the cache it also adds the OMResponse to the DoubleBuffer 
> which is operated on by an always running daemon thread that calls 
> OzoneManagerDoubleBuffer#flush every time a new entry is added to the buffer. 
> However OM doesn't wait for the op to be flushed to return the response to 
> the client instead it sends the response as soon as the cache And Double 
> buffer is updated, and bucket lock is not taken during actual put to DB.
> This addition to cache is useful as it protects subsequent reads from 
> accessing the DB where the flush might not have happened.
> Eg
> *Rename Key*
> t1-> Update renamed Key in Cache and add response to DoubleBuffer and 
> response is returned and bucket lock released but RocksDB is not yet updated 
> with renamed value.
> t2 -> Client issues read on the renamed path, During read it  first check 
> cache and sees renamed value.
> t3-> Updates the renamed value to actual DB and cleans up cache.
>  
> However for *CreateSnapshot*
> t1-> Update snapshot info in Cache and return response to the client after 
> bucket lock released.
> t2-> Client issues read on the renamed path, During read it finds the 
> Snapshot info object  and the checkpoint dir location from the Snapshot Table 
> Cache, however the checkpoint is actually created when actual RocksDB is 
> updated in OMCreateSnapshotResponse#addToDBBatch and the read fails
> t3-> Add snapshotInfo to DB and create checkpoint.
>  
> One of the fixes here I can think of is to wait for the checkpoint dir 
> creation during read if the snapshot info is in cache.
> Another solution would also be to create checkpoint in validateAndUpdateCache 
> itself



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to