[
https://issues.apache.org/jira/browse/FLINK-10203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Flink Jira Bot updated FLINK-10203:
-----------------------------------
Labels: auto-unassigned pull-request-available stale-major (was:
auto-unassigned pull-request-available)
I am the [Flink Jira Bot|https://github.com/apache/flink-jira-bot/] and I help
the community manage its development. I see this issues has been marked as
Major but is unassigned and neither itself nor its Sub-Tasks have been updated
for 30 days. I have gone ahead and added a "stale-major" to the issue". If this
ticket is a Major, please either assign yourself or give an update. Afterwards,
please remove the label or in 7 days the issue will be deprioritized.
> Support truncate method for old Hadoop versions in
> HadoopRecoverableFsDataOutputStream
> --------------------------------------------------------------------------------------
>
> Key: FLINK-10203
> URL: https://issues.apache.org/jira/browse/FLINK-10203
> Project: Flink
> Issue Type: Bug
> Components: API / DataStream, Connectors / FileSystem
> Affects Versions: 1.6.0, 1.6.1, 1.7.0
> Reporter: Artsem Semianenka
> Priority: Major
> Labels: auto-unassigned, pull-request-available, stale-major
> Attachments: legacy truncate logic.pdf
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> New StreamingFileSink ( introduced in 1.6 Flink version ) use
> HadoopRecoverableFsDataOutputStream wrapper to write data in HDFS.
> HadoopRecoverableFsDataOutputStream is a wrapper for FSDataOutputStream to
> have an ability to restore from a certain point of the file after failure and
> continue to write data. To achieve this recovery functionality the
> HadoopRecoverableFsDataOutputStream uses "truncate" method which was
> introduced only in Hadoop 2.7.
> FLINK-14170 has enabled the usage of StreamingFileSink for
> OnCheckpointRollingPolicy, but it is still not possible to use
> StreamingFileSink with DefaultRollingPolicy, which makes writing of the data
> to HDFS unpractical in scale for HDFS < 2.7.
> Unfortunately, there are a few official Hadoop distributives which latest
> version still use Hadoop 2.6 (This distributives: Cloudera, Pivotal HD ). As
> the result Flinks Hadoop connector can't work with this distributives.
> Flink declares that supported Hadoop from version 2.4.0 upwards
> ([https://ci.apache.org/projects/flink/flink-docs-release-1.6/start/building.html#hadoop-versions])
> I guess we should emulate the functionality of "truncate" method for older
> Hadoop versions.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)