Hi Kota,

I went through your proposal and it looks good.
Let us discuss this in our next ozone community meeting as well. Let us connect 
on apache slack.

Regards,
Prashant


> On Jan 27, 2022, at 11:50 PM, Kota Uenishi <k...@preferred.jp> wrote:
> 
> Hi Ozone dev,
> 
> I once proposed fix for HDDS-5905, but it's been a while. Now our
> cluster got stable after a few work and I've got time to resume my
> work on HDDS-5905. - and I came up to face a design decision on key
> formatting again, as I learned more in detail about Ozone internals.
> 
> Bharat once gave me an advice [1] to use object IDs instead of
> transaction index (and instead of timestamps), to address restart and
> cluster upgrade to Ratis. But it has a drawback on object overwrite
> and I came up with another design choice. They are:
> 
> 1. Use object IDs as a key in the delete table
> Pros: object IDs are consistently used in OM and easy to pick up in
> RocksDB batch.
> Cons:
> - On objects being overwrite, object ID of the key is not updated,
> while previous blocks
>   of the overwritten key are eligible for deletion (see HDDS-5461 and
> HDDS-5656).
>   Under this condition, there are a race where blocks gets lost and
> will never be
>   collected. Example scenario is like:
> 
> key open  oid=1
> key commit
> key open (overwrite) oid=1’  #<= oid must be updated on overwrite, or
> use update id
> key delete oid=1
> key commit
> key delete oid=1’ (<= overwritten and previous block gets leaked)
> deletion service deletes 1’
> 
>   This behavior should be changed as to assign new oid=2 on overwrite.
> - In addition to the need of this fix, blocks are deleted in the
> order of key open,
>   not in the order of key deletion. It's better than alphabetical
> order, but not
>   perfect.
> 
> 2. Use update IDs as a key in the delete table
> Pros: The design is cleaner and the order of block deletion will be correct.
> Cons:
> - Currently, assignment of update IDs are somewhat fuzzy. In most places
>   raw transaction index, in some places object ID is used as-is e.g. directory
>   creation (See OMDirectoryCreateRequest.java).
> - A fix on the update ID assignment would be prefix them with epoch nubmer
>   as well as object ID, but most part of setting update ID should be fixed.
> 
> I feel 1. is easier but a bit not correct, while 2 is more correct but
> the required change is wide. I updated my proposal accordingly [2], so
> please let me know your thoughts on which to choose. Also, my messy
> working branch can be found here [3].
> 
> P.S. my fix on HDDS-5905 conflicts and depends on HDDS-5656, because
> it's also about key deletion and overwrite. I want to get it reviewed
> and merged beforehand. It's kinda leftover task from HDDS-5461 and
> should be merged for 1.3.
> 
> [1] https://lists.apache.org/thread/79qgx598rv3qcojmzoxhc9ypkh1jj64y
> [2] 
> https://docs.google.com/document/d/1KeyhiE1i5SqRSgLy-pIOGW9X6mUYb8iYEkEoDAEQD9Q/edit#heading=h.nqxuhw78zsv7
> [3] https://github.com/kuenishi/ozone/pull/1
> 
> -- 
> --
> Kota UENISHI, Engineer
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@ozone.apache.org
> For additional commands, e-mail: dev-h...@ozone.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@ozone.apache.org
For additional commands, e-mail: dev-h...@ozone.apache.org

Reply via email to