[ 
https://issues.apache.org/jira/browse/HDDS-8238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz-wo Sze updated HDDS-8238:
-----------------------------
    Description: 
When using multipart upload to upload a 1000-part key. OM RocksDb log needs to 
use 733.15 MB disk space. For a 10,000-part key, it can use 71.5 GB .  The disk 
space usage for the log is O(n^2) for a n-part key.

MultipartKeyInfo has a list of PartKeyInfo (partKeyInfoList) while each 
PartKeyInfo has a KeyInfo and the size of each KeyInfo is ~1.5 KB.
{code:java}
//OmClientProtocol.proto
message MultipartKeyInfo {
    required string uploadID = 1;
    ...
    repeated PartKeyInfo partKeyInfoList = 5;
    ...
}

message PartKeyInfo {
    required string partName = 1;
    required uint32 partNumber = 2;
    required KeyInfo partKeyInfo = 3; // the size of a KeyInfo is ~1.5 KB
}
{code}
Multipart upload repeatedly puts the same key with a MultipartKeyInfo object 
for each part.
 * When uploading the first part, partKeyInfoList has 1 element.
 * When uploading the second part, partKeyInfoList has 2 elements.
 * When uploading the third part, partKeyInfoList has 3 elements.
...

 * When uploading the 1000th part, partKeyInfoList has 1000 elements.

The partKeyInfoList is written to RocksDb log for each part. The number of 
PartKeyInfo objects written to RocksDb log is
 - (1 + 1000)*1000/2 = 500,500 objects

and the size is 1.5 KB * 500,500 = 733.15 MB.

Note that only the RocksDb log size is large.  The size of the RocksDb tables 
are small since the partKeyInfoList will be overwritten again and again for 
each part.

  was:
When using multipart upload to upload a 1000-part key. OM RocksDb needs to use 
733.15 MB disk space. For a 10,000-part key, it can use 71.5 GB .  The disk 
space usage is O(n^2) for a n-part key.

MultipartKeyInfo has a list of PartKeyInfo (partKeyInfoList) while each 
PartKeyInfo has a KeyInfo and the size of each KeyInfo is ~1.5 KB.
{code:java}
message MultipartKeyInfo {
    required string uploadID = 1;
    ...
    repeated PartKeyInfo partKeyInfoList = 5;
    ...
}

message PartKeyInfo {
    required string partName = 1;
    required uint32 partNumber = 2;
    required KeyInfo partKeyInfo = 3; // the size of a KeyInfo is ~1.5 KB
}
{code}
Multipart upload repeatedly put the same key with a MultipartKeyInfo object for 
each part.
 # When uploading the first part, partKeyInfoList has 1 element.
 # When uploading the second part, partKeyInfoList has 2 elements.
 # When uploading the third part, partKeyInfoList has 3 elements.
...

1000. When uploading the 1000th part, partKeyInfoList has 1000 elements.

Then, the number of PartKeyInfo objects uploaded is
 - (1 + 1000)*1000/2 = 500,500 and
 - the size is 1.5 KB * 500,500 = 733.15 MB.


> Multipart upload generates a huge OM rocksdb log
> ------------------------------------------------
>
>                 Key: HDDS-8238
>                 URL: https://issues.apache.org/jira/browse/HDDS-8238
>             Project: Apache Ozone
>          Issue Type: Improvement
>          Components: OM
>            Reporter: Tsz-wo Sze
>            Priority: Major
>
> When using multipart upload to upload a 1000-part key. OM RocksDb log needs 
> to use 733.15 MB disk space. For a 10,000-part key, it can use 71.5 GB .  The 
> disk space usage for the log is O(n^2) for a n-part key.
> MultipartKeyInfo has a list of PartKeyInfo (partKeyInfoList) while each 
> PartKeyInfo has a KeyInfo and the size of each KeyInfo is ~1.5 KB.
> {code:java}
> //OmClientProtocol.proto
> message MultipartKeyInfo {
>     required string uploadID = 1;
>     ...
>     repeated PartKeyInfo partKeyInfoList = 5;
>     ...
> }
> message PartKeyInfo {
>     required string partName = 1;
>     required uint32 partNumber = 2;
>     required KeyInfo partKeyInfo = 3; // the size of a KeyInfo is ~1.5 KB
> }
> {code}
> Multipart upload repeatedly puts the same key with a MultipartKeyInfo object 
> for each part.
>  * When uploading the first part, partKeyInfoList has 1 element.
>  * When uploading the second part, partKeyInfoList has 2 elements.
>  * When uploading the third part, partKeyInfoList has 3 elements.
> ...
>  * When uploading the 1000th part, partKeyInfoList has 1000 elements.
> The partKeyInfoList is written to RocksDb log for each part. The number of 
> PartKeyInfo objects written to RocksDb log is
>  - (1 + 1000)*1000/2 = 500,500 objects
> and the size is 1.5 KB * 500,500 = 733.15 MB.
> Note that only the RocksDb log size is large.  The size of the RocksDb tables 
> are small since the partKeyInfoList will be overwritten again and again for 
> each part.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to