Kostas Kloudas created FLINK-10963:
--------------------------------------
Summary: Cleanup small objects uploaded to S3 as independent
objects
Key: FLINK-10963
URL: https://issues.apache.org/jira/browse/FLINK-10963
Project: Flink
Issue Type: Sub-task
Components: filesystem-connector
Affects Versions: 1.7.0
Reporter: Kostas Kloudas
Assignee: Kostas Kloudas
Fix For: 1.7.1
The S3 {{RecoverableWriter}} uses the Multipart Upload (MPU) Feature of S3 in
order to upload the different part files. This means that a large part is split
in chunks of at least 5MB which are uploaded independently, whenever each one
of them is ready.
This 5MB minimum size requires special handling of parts that are less than 5MB
when a checkpoint barrier arrives. These small files are uploaded as
independent objects (not associated with an active MPU). This way, when Flink
needs to restore, it simply downloads them and resumes writing to them.
These small objects are currently not cleaned up, thus leading to wasted space
on S3.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)