errose28 commented on code in PR #6482: URL: https://github.com/apache/ozone/pull/6482#discussion_r1556607139
########## hadoop-hdds/docs/content/design/overwrite-key-only-if-unchanged.md: ########## @@ -0,0 +1,149 @@ +--- +title: Overwriting an Ozone Key only if it has not changed. +summary: A minimal design illustrating how to replace a key in Ozone only if it has not changes since it was read. +date: 2024-04-05 +jira: HDDS-10657 +status: accepted +author: Stephen ODonnell +--- + +<!-- + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. See accompanying LICENSE file. +--> + + +Ozone offers write semantics where the last writer to commit a key wins. Therefore multiple writers can concurrently write the same key, and which ever commits last will effectively overwrite all data that came before it. + +As an extension of this, there is no "locking" on a key which is being replaced. + +For any key, but especially a large key, it can take significant time to read and write it. There are scenarios where it would be desirable to replace a key in Ozone, but only if the key has not changed since it was read. With the absence of a lock, such an operation is not possible today. + +## As Things Stand + +Internally, all Ozone keys have both an objectID and UpdateID which are stored in OM as part of the key metadata. + +Each time something changes on the key, whether it is data or metadata, the updateID is changed. It comes from the ratis transactionID and is generally an increasing number. + +When an existing key is over written, its existing metadata including the ObjectID and ACLs are mirrored onto the new key version. The only metadata which is replaced is any custom metadata stored on the key by the user. Upon commit, the updateID is also changed to the current Ratis transaction ID. + +Writing a key in Ozone is a 3 step process: + +1. The key is opened via an Open Key request from the client to OM +2. The client writes data to the data nodes +3. The client commits the key to OM via a Commit Key call. + + +## Atomic Key Replacement + +In relational database applications, records are often assigned an update counter similar to the updateID for a key in Ozone. The data record can be read and displayed on a UI to be edited, and then written back to the database. However another user could have made an edit to the same record in the mean time, and if the record is written back without any checks, those edits could be lost. + +To combat this, "optimistic locking" is used. With Optimistic locking, no locks are actually involved. The client reads the data along with the update counter. When it attempts to write the data back, it validates the record has not change by including the updateID in the update statement, eg: + +``` +update customerDetails +set <columns = values> +where customerID = :b1 +and updateCounter = :b2 +``` +If no records are updated, the application must display an error or reload the customer record to handle the problem. + +In Ozone the same concept can be used to perform an atomic update of a key only if it has not changed since the key details were originally read. + +To do this: + +1. The client reads the key details as usual. The key details can be extended to include the existing updateID as it is currently not passed to the client. +2. The client opens a new key for writing with the same key name as the original, passing the previously read updateID in a new field. Call this new field overwriteExpectedUpdateID. Review Comment: I guess this depends on an implementation detail that still needs to be specified in the doc: - If rewrite keeps the same update ID that was present at the time of being replaced, then using the update ID field makes sense, because it is just storing its final value as intended. This is probably not a good way to do it though because we want the rewrite to count as an update to the key as well. - If rewrite increments the update ID, then a new field is probably better. That way we can see rewrite as a new operation. So I guess we want the following in the doc: - Rewrite increments the update ID on commit, the same as any other commit operation. - If we go the route of persisting rewriteID (or whatever the field is called) to the open key table, do we also persist it to the DB? - This would give us an indication that the file was rewritten, but also that is more what the audit logs are for than a DB dump. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
