[ 
https://issues.apache.org/jira/browse/HDDS-10611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

GuoHao updated HDDS-10611:
--------------------------
    Description: 
Multipart upload when uploading large files, it will repeatedly serialize and 
deserialize large objects, causing excessive gc pressure on om.

 

For example, if you upload a 200GB file via mpu, using 5MB per part, you'll 
call `upload part` 40960 times.

1. each time you process `S3MultipartUploadCommitPartRequest`, you need to read 
the content from MultipartInfoTable to deserialize and add new part 
information, and serialize it and write it back to MultipartInfoTable.

2. when transferring to 100GB, the MultipartInfo object is already very large, 
which affects the gc of om.

3. the current plan is to split the contents of MultipartInfoTable table, the 
part information is stored separately in another table, which can reduce the 
number of serialization and deserialization of large objects, only in the 
S3MultipartUploadCompleteRequest, the part information in the split new table 
to merge the information

 

> Optimization of gc pressure on om due to handling of mpu large files
> --------------------------------------------------------------------
>
>                 Key: HDDS-10611
>                 URL: https://issues.apache.org/jira/browse/HDDS-10611
>             Project: Apache Ozone
>          Issue Type: Improvement
>            Reporter: GuoHao
>            Priority: Major
>
> Multipart upload when uploading large files, it will repeatedly serialize and 
> deserialize large objects, causing excessive gc pressure on om.
>  
> For example, if you upload a 200GB file via mpu, using 5MB per part, you'll 
> call `upload part` 40960 times.
> 1. each time you process `S3MultipartUploadCommitPartRequest`, you need to 
> read the content from MultipartInfoTable to deserialize and add new part 
> information, and serialize it and write it back to MultipartInfoTable.
> 2. when transferring to 100GB, the MultipartInfo object is already very 
> large, which affects the gc of om.
> 3. the current plan is to split the contents of MultipartInfoTable table, the 
> part information is stored separately in another table, which can reduce the 
> number of serialization and deserialization of large objects, only in the 
> S3MultipartUploadCompleteRequest, the part information in the split new table 
> to merge the information
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to