[
https://issues.apache.org/jira/browse/HDDS-8238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17703560#comment-17703560
]
Tsz-wo Sze edited comment on HDDS-8238 at 3/22/23 9:13 AM:
-----------------------------------------------------------
There are a few possible improvements for the problem:
# Tune RocksDb by setting the log retention policy or other tricks. Hopefully,
it can keep the disk space usage small.
# Deduplicate the ops in RDBBatchOperation (HDDS-8128)
# Change the RocksDb table schema such as below:
#* a. When upload a part, include only the PartKeyInfo for that part instead of
a list of PartKeyInfo. When writing the PartKeyInfo to RocksDb, put the
PartKeyInfo with a part specific key.
#* b. KeyInfo has a lot of (duplicated) information such as volumeName,
bucketName, etc. as show below. Do not use KeyInfo in PartKeyInfo. Pick only
the necessarily fields.
{code}
//OmClientProtocol.proto
message KeyInfo { //1.5 KB each object
required string volumeName = 1;
required string bucketName = 2;
required string keyName = 3;
required uint64 dataSize = 4;
required hadoop.hdds.ReplicationType type = 5;
optional hadoop.hdds.ReplicationFactor factor = 6;
repeated KeyLocationList keyLocationList = 7;
required uint64 creationTime = 8;
required uint64 modificationTime = 9;
optional uint64 latestVersion = 10;
repeated hadoop.hdds.KeyValue metadata = 11;
optional FileEncryptionInfoProto fileEncryptionInfo = 12;
repeated OzoneAclInfo acls = 13;
optional uint64 objectID = 14;
optional uint64 updateID = 15;
optional uint64 parentID = 16;
optional hadoop.hdds.ECReplicationConfig ecReplicationConfig = 17;
optional FileChecksumProto fileChecksum = 18;
}
{code}
was (Author: szetszwo):
There are a few possible improvements for the problem:
# Tune RocksDb to decrease the log retention policy or other tricks.
Hopefully, it can keep the disk space usage small.
# Deduplicate the ops in RDBBatchOperation (HDDS-8128)
# Change the RocksDb table schema such as below:
#* a. When upload a part, include only the PartKeyInfo for that part instead of
a list of PartKeyInfo. When writing the PartKeyInfo to RocksDb, put the
PartKeyInfo with a part specific key.
#* b. KeyInfo has a lot of (duplicated) information such as volumeName,
bucketName, etc. as show below. Do not use KeyInfo in PartKeyInfo. Pick only
the necessarily fields.
{code}
//OmClientProtocol.proto
message KeyInfo { //1.5 KB each object
required string volumeName = 1;
required string bucketName = 2;
required string keyName = 3;
required uint64 dataSize = 4;
required hadoop.hdds.ReplicationType type = 5;
optional hadoop.hdds.ReplicationFactor factor = 6;
repeated KeyLocationList keyLocationList = 7;
required uint64 creationTime = 8;
required uint64 modificationTime = 9;
optional uint64 latestVersion = 10;
repeated hadoop.hdds.KeyValue metadata = 11;
optional FileEncryptionInfoProto fileEncryptionInfo = 12;
repeated OzoneAclInfo acls = 13;
optional uint64 objectID = 14;
optional uint64 updateID = 15;
optional uint64 parentID = 16;
optional hadoop.hdds.ECReplicationConfig ecReplicationConfig = 17;
optional FileChecksumProto fileChecksum = 18;
}
{code}
> Multipart upload generates a huge OM rocksdb log
> ------------------------------------------------
>
> Key: HDDS-8238
> URL: https://issues.apache.org/jira/browse/HDDS-8238
> Project: Apache Ozone
> Issue Type: Improvement
> Components: OM
> Reporter: Tsz-wo Sze
> Priority: Major
>
> When using multipart upload to upload a 1000-part key. OM RocksDb log needs
> to use 733.15 MB disk space. For a 10,000-part key, it can use 71.5 GB . The
> disk space usage for the log is O(n^2) for a n-part key.
> MultipartKeyInfo has a list of PartKeyInfo (partKeyInfoList) while each
> PartKeyInfo has a KeyInfo and the size of each KeyInfo is ~1.5 KB.
> {code:java}
> //OmClientProtocol.proto
> message MultipartKeyInfo {
> required string uploadID = 1;
> ...
> repeated PartKeyInfo partKeyInfoList = 5;
> ...
> }
> message PartKeyInfo {
> required string partName = 1;
> required uint32 partNumber = 2;
> required KeyInfo partKeyInfo = 3; // the size of a KeyInfo is ~1.5 KB
> }
> {code}
> Multipart upload repeatedly puts the same key with a MultipartKeyInfo object
> for each part.
> * When uploading the first part, partKeyInfoList has 1 element.
> * When uploading the second part, partKeyInfoList has 2 elements.
> * When uploading the third part, partKeyInfoList has 3 elements.
> ...
> * When uploading the 1000th part, partKeyInfoList has 1000 elements.
> The partKeyInfoList is written to RocksDb log for each part. The number of
> PartKeyInfo objects written to RocksDb log is
> - (1 + 1000)*1000/2 = 500,500 objects
> and the size is 1.5 KB * 500,500 = 733.15 MB.
> Note that only the RocksDb log size is large. The size of the RocksDb tables
> are small since the partKeyInfoList will be overwritten again and again for
> each part.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]