[
https://issues.apache.org/jira/browse/HDDS-10611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18059021#comment-18059021
]
Abhishek Pal commented on HDDS-10611:
-------------------------------------
Hi [~Nicholas Niu], I would like to work on this.
I am starting with the design document for this.
cc [~Sammi] [~rakeshr]
> Optimization of gc pressure on om due to handling of mpu large files
> --------------------------------------------------------------------
>
> Key: HDDS-10611
> URL: https://issues.apache.org/jira/browse/HDDS-10611
> Project: Apache Ozone
> Issue Type: Improvement
> Reporter: GuoHao
> Priority: Major
>
> Multipart upload when uploading large files, it will repeatedly serialize and
> deserialize large objects, causing excessive gc pressure on om.
>
> For example, if you upload a 200GB file via mpu, using 5MB per part, you'll
> call `upload part` 40960 times.
> 1. each time you process `S3MultipartUploadCommitPartRequest`, you need to
> read the content from MultipartInfoTable to deserialize and add new part
> information, and serialize it and write it back to MultipartInfoTable.
> 2. when transferring to 100GB, the MultipartInfo object is already very
> large, which affects the gc of om.
> 3. the current plan is to split the contents of MultipartInfoTable table, the
> part information is stored separately in another table, which can reduce the
> number of serialization and deserialization of large objects, only in the
> S3MultipartUploadCompleteRequest, the part information in the split new table
> to merge the information
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]