szetszwo commented on code in PR #7583: URL: https://github.com/apache/ozone/pull/7583#discussion_r2017513870
########## hadoop-hdds/docs/content/design/leader-execution/obs-locking.md: ########## @@ -0,0 +1,97 @@ +--- +title: Ozone Granular locking for OBS bucket +summary: Granular locking for OBS bucket +date: 2025-01-06 +jira: HDDS-11898 +status: draft +author: Sumit Agrawal +--- +<!-- + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. See accompanying LICENSE file. +--> + +# OBS locking + +OBS case just involves volume, bucket and key. So this is more simplified in terms of locking. + +There will be: +1. Volume Strip Lock: locking for volume +2. Bucket Strip Lock: locking for bucket +3. Key Strip Lock: Locking for key + +**Note**: Multiple keys locking (like delete multiple keys or rename operation), lock needs to be taken in order, i.e. using StrippedLocking order to avoid deadlock. + +Stripped locking ordering: +- Strip lock is obtained over a hash bucket. Review Comment: What are hash buckets? Is it the same as Ozone buckets? ########## hadoop-hdds/docs/content/design/leader-execution/obs-locking.md: ########## @@ -0,0 +1,97 @@ +--- +title: Ozone Granular locking for OBS bucket +summary: Granular locking for OBS bucket +date: 2025-01-06 +jira: HDDS-11898 +status: draft +author: Sumit Agrawal +--- +<!-- + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. See accompanying LICENSE file. +--> + +# OBS locking + +OBS case just involves volume, bucket and key. So this is more simplified in terms of locking. + +There will be: +1. Volume Strip Lock: locking for volume +2. Bucket Strip Lock: locking for bucket +3. Key Strip Lock: Locking for key Review Comment: Please describe what are these locks are protecting. Lock structure in general should looks like a tree. Why we don't need a root lock here? ########## hadoop-hdds/docs/content/design/leader-execution/obs-locking.md: ########## @@ -0,0 +1,97 @@ +--- +title: Ozone Granular locking for OBS bucket +summary: Granular locking for OBS bucket +date: 2025-01-06 +jira: HDDS-11898 +status: draft +author: Sumit Agrawal +--- +<!-- + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. See accompanying LICENSE file. +--> + +# OBS locking + +OBS case just involves volume, bucket and key. So this is more simplified in terms of locking. + +There will be: +1. Volume Strip Lock: locking for volume +2. Bucket Strip Lock: locking for bucket +3. Key Strip Lock: Locking for key + +**Note**: Multiple keys locking (like delete multiple keys or rename operation), lock needs to be taken in order, i.e. using StrippedLocking order to avoid deadlock. + +Stripped locking ordering: +- Strip lock is obtained over a hash bucket. +- All keys needs to be ordered with hash bucket +- And then need take lock in sequence order + +## OBS operation +Bucket read lock will be there default. This is to ensure: +- key operation uses updated bucket acl, quota and other properties +- key does not becomes dandling when parallel bucket is deleted + +Note: Volume lock is not required as key depends on bucket only to retrieve information and bucket as parent. + +For key operations in OBS buckets, the following concurrency control is proposed: + +| API Name | Locking Key | Notes | +|-------------------------|-------------------------------------------|-------------------------------------------------------------------------------------------------------------| +| CreateKey | Bucket Read Lock, `No Lock` for key | Key can be created parallel by client in open key table, so do not need key lock | +| CommitKey | Bucket Read, Key Write lock | Avoid parallel key commit by different client, otherwise it can leave dangling blocks on overwrite | +| InitiateMultiPartUpload | Bucket Read Lock, `No Lock` for key | Key can be created parallel by client with different uploadId in open key table, so do not need key lock | +| CommitMultiPartUpload | WriteLock: PartKey Name | Avoid same part commit in parallel, else it can leave dangling blocks on overwrite | +| CompleteMultiPartUpload | Bucket Read, Key Write lock | Avoid parallel multi-part upload complete with different uploadId, else overwrite can cause dangling blocks | +| AbortMultiPartUpload | Bucket Read, Key Write lock | Need to avoid other operation in parallel like commit part | +| DeleteKey | Bucket Read, Key Write lock | Avoid create and delete in parallel | +| DeleteKeys | Bucket Read, Key Write with ordered lock | Avoid create and delete in parallel, ordered key locks to avoid deadlock | +| RenameKey | Bucket Read, Keys Write with ordered lock | lock in order to delete key from original location and move to new location | +| SetAcl | Bucket Read, Key Write lock | Avoid parallel delete key | +| AddAcl | Bucket Read, Key Write lock | Avoid parallel delete key | +| RemoveAcl | Bucket Read, Key Write lock | Avoid parallel delete key | +| AllocateBlock | Bucket Read, Key Write lock | Need lock key to avoid wrong update of key, if same client does parallel allocate | +| SetTimes | Bucket Read, Key Write lock | Avoid parallel delete key | + +Batch Operation: +1. deleteKeys: batch will be divided to multiple threads in Execution Pool to run parallel calling DeleteKey +2. RenameKeys: This is `depreciated`, but for compatibility, will be divided to multiple threads in Execution Pool to run parallel calling RenameKey + +For batch operation, atomicity is not guranteed for above api, and same is behavior for s3 perspective. + +## Bucket and volume locking as required for concurrency for obs key handling + +### Volume Operation Review Comment: We probably should start with Volume, then Bucket and then Key. ########## hadoop-hdds/docs/content/design/leader-execution/obs-locking.md: ########## @@ -0,0 +1,97 @@ +--- +title: Ozone Granular locking for OBS bucket +summary: Granular locking for OBS bucket +date: 2025-01-06 +jira: HDDS-11898 +status: draft +author: Sumit Agrawal +--- +<!-- + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. See accompanying LICENSE file. +--> + +# OBS locking + +OBS case just involves volume, bucket and key. So this is more simplified in terms of locking. + +There will be: +1. Volume Strip Lock: locking for volume +2. Bucket Strip Lock: locking for bucket +3. Key Strip Lock: Locking for key + +**Note**: Multiple keys locking (like delete multiple keys or rename operation), lock needs to be taken in order, i.e. using StrippedLocking order to avoid deadlock. + +Stripped locking ordering: +- Strip lock is obtained over a hash bucket. +- All keys needs to be ordered with hash bucket +- And then need take lock in sequence order + +## OBS operation +Bucket read lock will be there default. This is to ensure: Review Comment: Even for volume level operations? ########## hadoop-hdds/docs/content/design/leader-execution/obs-locking.md: ########## @@ -0,0 +1,97 @@ +--- +title: Ozone Granular locking for OBS bucket +summary: Granular locking for OBS bucket +date: 2025-01-06 +jira: HDDS-11898 +status: draft +author: Sumit Agrawal +--- +<!-- + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. See accompanying LICENSE file. +--> + +# OBS locking + +OBS case just involves volume, bucket and key. So this is more simplified in terms of locking. + +There will be: +1. Volume Strip Lock: locking for volume +2. Bucket Strip Lock: locking for bucket +3. Key Strip Lock: Locking for key + +**Note**: Multiple keys locking (like delete multiple keys or rename operation), lock needs to be taken in order, i.e. using StrippedLocking order to avoid deadlock. + +Stripped locking ordering: +- Strip lock is obtained over a hash bucket. +- All keys needs to be ordered with hash bucket +- And then need take lock in sequence order + +## OBS operation +Bucket read lock will be there default. This is to ensure: +- key operation uses updated bucket acl, quota and other properties +- key does not becomes dandling when parallel bucket is deleted + +Note: Volume lock is not required as key depends on bucket only to retrieve information and bucket as parent. + +For key operations in OBS buckets, the following concurrency control is proposed: + +| API Name | Locking Key | Notes | +|-------------------------|-------------------------------------------|-------------------------------------------------------------------------------------------------------------| +| CreateKey | Bucket Read Lock, `No Lock` for key | Key can be created parallel by client in open key table, so do not need key lock | +| CommitKey | Bucket Read, Key Write lock | Avoid parallel key commit by different client, otherwise it can leave dangling blocks on overwrite | +| InitiateMultiPartUpload | Bucket Read Lock, `No Lock` for key | Key can be created parallel by client with different uploadId in open key table, so do not need key lock | +| CommitMultiPartUpload | WriteLock: PartKey Name | Avoid same part commit in parallel, else it can leave dangling blocks on overwrite | +| CompleteMultiPartUpload | Bucket Read, Key Write lock | Avoid parallel multi-part upload complete with different uploadId, else overwrite can cause dangling blocks | +| AbortMultiPartUpload | Bucket Read, Key Write lock | Need to avoid other operation in parallel like commit part | +| DeleteKey | Bucket Read, Key Write lock | Avoid create and delete in parallel | +| DeleteKeys | Bucket Read, Key Write with ordered lock | Avoid create and delete in parallel, ordered key locks to avoid deadlock | +| RenameKey | Bucket Read, Keys Write with ordered lock | lock in order to delete key from original location and move to new location | +| SetAcl | Bucket Read, Key Write lock | Avoid parallel delete key | +| AddAcl | Bucket Read, Key Write lock | Avoid parallel delete key | +| RemoveAcl | Bucket Read, Key Write lock | Avoid parallel delete key | +| AllocateBlock | Bucket Read, Key Write lock | Need lock key to avoid wrong update of key, if same client does parallel allocate | +| SetTimes | Bucket Read, Key Write lock | Avoid parallel delete key | + +Batch Operation: +1. deleteKeys: batch will be divided to multiple threads in Execution Pool to run parallel calling DeleteKey +2. RenameKeys: This is `depreciated`, but for compatibility, will be divided to multiple threads in Execution Pool to run parallel calling RenameKey Review Comment: Compatibility might not be important here if our target is Ozone 3.0.0. ########## hadoop-hdds/docs/content/design/leader-execution/obs-locking.md: ########## @@ -0,0 +1,97 @@ +--- +title: Ozone Granular locking for OBS bucket +summary: Granular locking for OBS bucket +date: 2025-01-06 +jira: HDDS-11898 +status: draft +author: Sumit Agrawal +--- +<!-- + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. See accompanying LICENSE file. +--> + +# OBS locking + +OBS case just involves volume, bucket and key. So this is more simplified in terms of locking. + +There will be: +1. Volume Strip Lock: locking for volume +2. Bucket Strip Lock: locking for bucket +3. Key Strip Lock: Locking for key + +**Note**: Multiple keys locking (like delete multiple keys or rename operation), lock needs to be taken in order, i.e. using StrippedLocking order to avoid deadlock. + +Stripped locking ordering: +- Strip lock is obtained over a hash bucket. +- All keys needs to be ordered with hash bucket +- And then need take lock in sequence order Review Comment: Define "ordered with hash bucket" and "sequence order". ########## hadoop-hdds/docs/content/design/leader-execution/obs-locking.md: ########## @@ -0,0 +1,97 @@ +--- +title: Ozone Granular locking for OBS bucket +summary: Granular locking for OBS bucket +date: 2025-01-06 +jira: HDDS-11898 +status: draft +author: Sumit Agrawal +--- +<!-- + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. See accompanying LICENSE file. +--> + +# OBS locking + +OBS case just involves volume, bucket and key. So this is more simplified in terms of locking. + +There will be: +1. Volume Strip Lock: locking for volume +2. Bucket Strip Lock: locking for bucket +3. Key Strip Lock: Locking for key + +**Note**: Multiple keys locking (like delete multiple keys or rename operation), lock needs to be taken in order, i.e. using StrippedLocking order to avoid deadlock. + +Stripped locking ordering: +- Strip lock is obtained over a hash bucket. +- All keys needs to be ordered with hash bucket +- And then need take lock in sequence order + +## OBS operation +Bucket read lock will be there default. This is to ensure: +- key operation uses updated bucket acl, quota and other properties +- key does not becomes dandling when parallel bucket is deleted + +Note: Volume lock is not required as key depends on bucket only to retrieve information and bucket as parent. Review Comment: Usually, a lower level lock requires locking all its ancestors; e.g. see https://en.wikipedia.org/wiki/Multiple_granularity_locking If it does not have the volume lock, what if the volume is renamed? ########## hadoop-hdds/docs/content/design/leader-execution/obs-locking.md: ########## @@ -0,0 +1,97 @@ +--- +title: Ozone Granular locking for OBS bucket +summary: Granular locking for OBS bucket +date: 2025-01-06 +jira: HDDS-11898 +status: draft +author: Sumit Agrawal +--- +<!-- + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. See accompanying LICENSE file. +--> + +# OBS locking + +OBS case just involves volume, bucket and key. So this is more simplified in terms of locking. + +There will be: +1. Volume Strip Lock: locking for volume +2. Bucket Strip Lock: locking for bucket +3. Key Strip Lock: Locking for key + +**Note**: Multiple keys locking (like delete multiple keys or rename operation), lock needs to be taken in order, i.e. using StrippedLocking order to avoid deadlock. + +Stripped locking ordering: +- Strip lock is obtained over a hash bucket. +- All keys needs to be ordered with hash bucket +- And then need take lock in sequence order + +## OBS operation +Bucket read lock will be there default. This is to ensure: +- key operation uses updated bucket acl, quota and other properties +- key does not becomes dandling when parallel bucket is deleted + +Note: Volume lock is not required as key depends on bucket only to retrieve information and bucket as parent. + +For key operations in OBS buckets, the following concurrency control is proposed: + +| API Name | Locking Key | Notes | +|-------------------------|-------------------------------------------|-------------------------------------------------------------------------------------------------------------| +| CreateKey | Bucket Read Lock, `No Lock` for key | Key can be created parallel by client in open key table, so do not need key lock | +| CommitKey | Bucket Read, Key Write lock | Avoid parallel key commit by different client, otherwise it can leave dangling blocks on overwrite | +| InitiateMultiPartUpload | Bucket Read Lock, `No Lock` for key | Key can be created parallel by client with different uploadId in open key table, so do not need key lock | +| CommitMultiPartUpload | WriteLock: PartKey Name | Avoid same part commit in parallel, else it can leave dangling blocks on overwrite | +| CompleteMultiPartUpload | Bucket Read, Key Write lock | Avoid parallel multi-part upload complete with different uploadId, else overwrite can cause dangling blocks | +| AbortMultiPartUpload | Bucket Read, Key Write lock | Need to avoid other operation in parallel like commit part | +| DeleteKey | Bucket Read, Key Write lock | Avoid create and delete in parallel | +| DeleteKeys | Bucket Read, Key Write with ordered lock | Avoid create and delete in parallel, ordered key locks to avoid deadlock | +| RenameKey | Bucket Read, Keys Write with ordered lock | lock in order to delete key from original location and move to new location | +| SetAcl | Bucket Read, Key Write lock | Avoid parallel delete key | +| AddAcl | Bucket Read, Key Write lock | Avoid parallel delete key | +| RemoveAcl | Bucket Read, Key Write lock | Avoid parallel delete key | +| AllocateBlock | Bucket Read, Key Write lock | Need lock key to avoid wrong update of key, if same client does parallel allocate | +| SetTimes | Bucket Read, Key Write lock | Avoid parallel delete key | + +Batch Operation: +1. deleteKeys: batch will be divided to multiple threads in Execution Pool to run parallel calling DeleteKey +2. RenameKeys: This is `depreciated`, but for compatibility, will be divided to multiple threads in Execution Pool to run parallel calling RenameKey + +For batch operation, atomicity is not guranteed for above api, and same is behavior for s3 perspective. Review Comment: > ... atomicity is not guranteed for above api ... I guess you mean atomicity is not guaranteed for the batch but it is guaranteed for the above API. No? ########## hadoop-hdds/docs/content/design/leader-execution/obs-locking.md: ########## @@ -0,0 +1,97 @@ +--- +title: Ozone Granular locking for OBS bucket +summary: Granular locking for OBS bucket +date: 2025-01-06 +jira: HDDS-11898 +status: draft +author: Sumit Agrawal +--- +<!-- + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. See accompanying LICENSE file. +--> + +# OBS locking + +OBS case just involves volume, bucket and key. So this is more simplified in terms of locking. + +There will be: +1. Volume Strip Lock: locking for volume Review Comment: Strip on what? I guess the answer is name? Why name but not id? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
