[
https://issues.apache.org/jira/browse/FLINK-12022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Flink Jira Bot updated FLINK-12022:
-----------------------------------
Labels: auto-unassigned stale-major (was: auto-unassigned)
I am the [Flink Jira Bot|https://github.com/apache/flink-jira-bot/] and I help
the community manage its development. I see this issues has been marked as
Major but is unassigned and neither itself nor its Sub-Tasks have been updated
for 30 days. I have gone ahead and added a "stale-major" to the issue". If this
ticket is a Major, please either assign yourself or give an update. Afterwards,
please remove the label or in 7 days the issue will be deprioritized.
> Enable StreamWriter to update file length on sync flush
> -------------------------------------------------------
>
> Key: FLINK-12022
> URL: https://issues.apache.org/jira/browse/FLINK-12022
> Project: Flink
> Issue Type: Improvement
> Components: Connectors / FileSystem
> Affects Versions: 1.6.4, 1.7.2
> Reporter: Paul Lin
> Priority: Major
> Labels: auto-unassigned, stale-major
>
> Currently, users of file systems that do not support truncating have to
> struggle with BucketingSink and use its valid length file to indicate the
> checkpointed data position. The problem is that by default the file length
> will only be updated when a block is full or the file is closed, but when the
> job crashes and the file is not closed properly, the file length is still
> behind its actual value and the checkpointed file length. When the job
> restarts, it looks like data loss, because the valid length is bigger than
> the file. This situation lasts until namenode notices the change of block
> size of the file, and it could be half an hour or more.
> So I propose to add an option to StreamWriterBase to update file lengths on
> each flush. This can be expensive because it involves namenode and should be
> used when strong consistency is needed.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)