[
https://issues.apache.org/jira/browse/HDDS-8128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tsz-wo Sze updated HDDS-8128:
-----------------------------
Component/s: db
(was: OM)
Description:
In a multipart upload test, the key "testKey" had 1000-parts with 8KB each.
The same key was uploaded 10 times sequentially (i.e. it overwrote the previous
upload) in a newly formatted cluster. The replication was 3, so the total raw
size of the key is ~ 24 MB. After the test has completed, OM rocks db uses ~
7.5 GB.
In this JIRA, we add a cache to RDBBatchOperation for deduplication. Within a
batch, the put-ops and delete-ops of the same key can be safely deduplicated.
Only the last op has to be applied to the db. All the previous ops can be
discarded.
was:In a multipart upload test, the key "testKey" had 1000-parts with 8KB
each. The same key was uploaded 10 times sequentially (i.e. it overwrote the
previous upload) in a newly formatted cluster. The replication was 3, so the
total raw size of the key is ~ 24 MB. After the test has completed, OM rocks
db uses ~ 7.5 GB.
Summary: Deduplicate the ops in RDBBatchOperation (was: OM rocksdb
uses a lot of space)
In this JIRA, we will focus on RDBBatchOperation deduplication, where
RDBBatchOperation is a utility class used everywhere including OM, SCM, DN, etc.
Filed HDDS-8238 for some further works specific OM.
> Deduplicate the ops in RDBBatchOperation
> ----------------------------------------
>
> Key: HDDS-8128
> URL: https://issues.apache.org/jira/browse/HDDS-8128
> Project: Apache Ozone
> Issue Type: Improvement
> Components: db
> Reporter: Tsz-wo Sze
> Assignee: Tsz-wo Sze
> Priority: Blocker
> Labels: pull-request-available
>
> In a multipart upload test, the key "testKey" had 1000-parts with 8KB each.
> The same key was uploaded 10 times sequentially (i.e. it overwrote the
> previous upload) in a newly formatted cluster. The replication was 3, so the
> total raw size of the key is ~ 24 MB. After the test has completed, OM rocks
> db uses ~ 7.5 GB.
> In this JIRA, we add a cache to RDBBatchOperation for deduplication. Within
> a batch, the put-ops and delete-ops of the same key can be safely
> deduplicated. Only the last op has to be applied to the db. All the
> previous ops can be discarded.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]