Hi Kota, I went through your proposal and it looks good. Let us discuss this in our next ozone community meeting as well. Let us connect on apache slack.
Regards, Prashant > On Jan 27, 2022, at 11:50 PM, Kota Uenishi <k...@preferred.jp> wrote: > > Hi Ozone dev, > > I once proposed fix for HDDS-5905, but it's been a while. Now our > cluster got stable after a few work and I've got time to resume my > work on HDDS-5905. - and I came up to face a design decision on key > formatting again, as I learned more in detail about Ozone internals. > > Bharat once gave me an advice [1] to use object IDs instead of > transaction index (and instead of timestamps), to address restart and > cluster upgrade to Ratis. But it has a drawback on object overwrite > and I came up with another design choice. They are: > > 1. Use object IDs as a key in the delete table > Pros: object IDs are consistently used in OM and easy to pick up in > RocksDB batch. > Cons: > - On objects being overwrite, object ID of the key is not updated, > while previous blocks > of the overwritten key are eligible for deletion (see HDDS-5461 and > HDDS-5656). > Under this condition, there are a race where blocks gets lost and > will never be > collected. Example scenario is like: > > key open oid=1 > key commit > key open (overwrite) oid=1’ #<= oid must be updated on overwrite, or > use update id > key delete oid=1 > key commit > key delete oid=1’ (<= overwritten and previous block gets leaked) > deletion service deletes 1’ > > This behavior should be changed as to assign new oid=2 on overwrite. > - In addition to the need of this fix, blocks are deleted in the > order of key open, > not in the order of key deletion. It's better than alphabetical > order, but not > perfect. > > 2. Use update IDs as a key in the delete table > Pros: The design is cleaner and the order of block deletion will be correct. > Cons: > - Currently, assignment of update IDs are somewhat fuzzy. In most places > raw transaction index, in some places object ID is used as-is e.g. directory > creation (See OMDirectoryCreateRequest.java). > - A fix on the update ID assignment would be prefix them with epoch nubmer > as well as object ID, but most part of setting update ID should be fixed. > > I feel 1. is easier but a bit not correct, while 2 is more correct but > the required change is wide. I updated my proposal accordingly [2], so > please let me know your thoughts on which to choose. Also, my messy > working branch can be found here [3]. > > P.S. my fix on HDDS-5905 conflicts and depends on HDDS-5656, because > it's also about key deletion and overwrite. I want to get it reviewed > and merged beforehand. It's kinda leftover task from HDDS-5461 and > should be merged for 1.3. > > [1] https://lists.apache.org/thread/79qgx598rv3qcojmzoxhc9ypkh1jj64y > [2] > https://docs.google.com/document/d/1KeyhiE1i5SqRSgLy-pIOGW9X6mUYb8iYEkEoDAEQD9Q/edit#heading=h.nqxuhw78zsv7 > [3] https://github.com/kuenishi/ozone/pull/1 > > -- > -- > Kota UENISHI, Engineer > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@ozone.apache.org > For additional commands, e-mail: dev-h...@ozone.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@ozone.apache.org For additional commands, e-mail: dev-h...@ozone.apache.org