Hi Ozone dev, I once proposed fix for HDDS-5905, but it's been a while. Now our cluster got stable after a few work and I've got time to resume my work on HDDS-5905. - and I came up to face a design decision on key formatting again, as I learned more in detail about Ozone internals.
Bharat once gave me an advice [1] to use object IDs instead of transaction index (and instead of timestamps), to address restart and cluster upgrade to Ratis. But it has a drawback on object overwrite and I came up with another design choice. They are: 1. Use object IDs as a key in the delete table Pros: object IDs are consistently used in OM and easy to pick up in RocksDB batch. Cons: - On objects being overwrite, object ID of the key is not updated, while previous blocks of the overwritten key are eligible for deletion (see HDDS-5461 and HDDS-5656). Under this condition, there are a race where blocks gets lost and will never be collected. Example scenario is like: key open oid=1 key commit key open (overwrite) oid=1’ #<= oid must be updated on overwrite, or use update id key delete oid=1 key commit key delete oid=1’ (<= overwritten and previous block gets leaked) deletion service deletes 1’ This behavior should be changed as to assign new oid=2 on overwrite. - In addition to the need of this fix, blocks are deleted in the order of key open, not in the order of key deletion. It's better than alphabetical order, but not perfect. 2. Use update IDs as a key in the delete table Pros: The design is cleaner and the order of block deletion will be correct. Cons: - Currently, assignment of update IDs are somewhat fuzzy. In most places raw transaction index, in some places object ID is used as-is e.g. directory creation (See OMDirectoryCreateRequest.java). - A fix on the update ID assignment would be prefix them with epoch nubmer as well as object ID, but most part of setting update ID should be fixed. I feel 1. is easier but a bit not correct, while 2 is more correct but the required change is wide. I updated my proposal accordingly [2], so please let me know your thoughts on which to choose. Also, my messy working branch can be found here [3]. P.S. my fix on HDDS-5905 conflicts and depends on HDDS-5656, because it's also about key deletion and overwrite. I want to get it reviewed and merged beforehand. It's kinda leftover task from HDDS-5461 and should be merged for 1.3. [1] https://lists.apache.org/thread/79qgx598rv3qcojmzoxhc9ypkh1jj64y [2] https://docs.google.com/document/d/1KeyhiE1i5SqRSgLy-pIOGW9X6mUYb8iYEkEoDAEQD9Q/edit#heading=h.nqxuhw78zsv7 [3] https://github.com/kuenishi/ozone/pull/1 -- -- Kota UENISHI, Engineer --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@ozone.apache.org For additional commands, e-mail: dev-h...@ozone.apache.org