[jira] [Commented] (FLINK-10203) Support truncate method for old Hadoop versions in HadoopRecoverableFsDataOutputStream

ASF GitHub Bot (JIRA) Wed, 31 Oct 2018 08:27:12 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-10203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16670237#comment-16670237
 ]


ASF GitHub Bot commented on FLINK-10203:
----------------------------------------

art4ul commented on issue #6608: [FLINK-10203]Support truncate method for old 
Hadoop versions in HadoopRecoverableFsDataOutputStream
URL: https://github.com/apache/flink/pull/6608#issuecomment-434730831
 
 
   @kl0u @StephanEwen 
   Hi guys, 
   Regarding your question:
   > - Does HDFS permit to rename to an already existing file name (replacing 
that existing file)?
   
   I've double checked it. HDFS has no ability to move file with overwriting. 
But this Pull request resolves this issue. 
   
    In case of failure after restarting the ‘truncate’ method check if the 
original file exists:
   -   If the original file exists - start the process from the beginning.
   -   If original file not exists but exists the file with '*.truncated' 
extension. The absence of original file tells us about that truncated file was 
written fully and deleted . The source crushed on the stage of renaming the 
truncated file. We can use file with '*.truncated' extension as a resultant 
file and finish the truncation process.
   
   Also, I would like to clarify your idea regarding recoverable writer with 
"Recover for resume" property.
   As far as I understand in this approach if Hadoop version greater than 2.7 
we going to instantiate recoverable writer with native Hadoop truncate logic 
and method supportsResume() should return true. Otherwise, we instantiate 
recoverable writer which never use truncate method (only create new files) and 
the method supportsResume() should return false.
   
   If you ok with this approach I can prepare another pull request. But in this 
case, I need to wait when the logic which check the supportsResume method will 
be implemented.
   Maybe I could help you with it ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> Support truncate method for old Hadoop versions in 
> HadoopRecoverableFsDataOutputStream
> --------------------------------------------------------------------------------------
>
>                 Key: FLINK-10203
>                 URL: https://issues.apache.org/jira/browse/FLINK-10203
>             Project: Flink
>          Issue Type: Bug
>          Components: DataStream API, filesystem-connector
>    Affects Versions: 1.6.0, 1.6.1, 1.7.0
>            Reporter: Artsem Semianenka
>            Assignee: Artsem Semianenka
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: legacy truncate logic.pdf
>
>
> New StreamingFileSink ( introduced in 1.6 Flink version ) use 
> HadoopRecoverableFsDataOutputStream wrapper to write data in HDFS.
> HadoopRecoverableFsDataOutputStream is a wrapper for FSDataOutputStream to 
> have an ability to restore from certain point of file after failure and 
> continue write data. To achieve this recover functionality the 
> HadoopRecoverableFsDataOutputStream use "truncate" method which was 
> introduced only in Hadoop 2.7 .
> Unfortunately there are a few official Hadoop distributive which latest 
> version still use Hadoop 2.6 (This distributives: Cloudera, Pivotal HD ). As 
> the result Flinks Hadoop connector can't work with this distributives.
> Flink declares that supported Hadoop from version 2.4.0 upwards 
> ([https://ci.apache.org/projects/flink/flink-docs-release-1.6/start/building.html#hadoop-versions])
> I guess we should emulate the functionality of "truncate" method for older 
> Hadoop versions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (FLINK-10203) Support truncate method for old Hadoop versions in HadoopRecoverableFsDataOutputStream

Reply via email to