[ 
https://issues.apache.org/jira/browse/HDDS-10611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18059021#comment-18059021
 ] 

Abhishek Pal commented on HDDS-10611:
-------------------------------------

Hi [~Nicholas Niu], I would like to work on this.
I am starting with the design document for this.

 

cc [~Sammi] [~rakeshr]

> Optimization of gc pressure on om due to handling of mpu large files
> --------------------------------------------------------------------
>
>                 Key: HDDS-10611
>                 URL: https://issues.apache.org/jira/browse/HDDS-10611
>             Project: Apache Ozone
>          Issue Type: Improvement
>            Reporter: GuoHao
>            Priority: Major
>
> Multipart upload when uploading large files, it will repeatedly serialize and 
> deserialize large objects, causing excessive gc pressure on om.
>  
> For example, if you upload a 200GB file via mpu, using 5MB per part, you'll 
> call `upload part` 40960 times.
> 1. each time you process `S3MultipartUploadCommitPartRequest`, you need to 
> read the content from MultipartInfoTable to deserialize and add new part 
> information, and serialize it and write it back to MultipartInfoTable.
> 2. when transferring to 100GB, the MultipartInfo object is already very 
> large, which affects the gc of om.
> 3. the current plan is to split the contents of MultipartInfoTable table, the 
> part information is stored separately in another table, which can reduce the 
> number of serialization and deserialization of large objects, only in the 
> S3MultipartUploadCompleteRequest, the part information in the split new table 
> to merge the information
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to