[ 
https://issues.apache.org/jira/browse/FLINK-9113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16425196#comment-16425196
 ] 

ASF GitHub Bot commented on FLINK-9113:
---------------------------------------

GitHub user twalthr opened a pull request:

    https://github.com/apache/flink/pull/5811

    [FLINK-9113] [connectors] Fix flushing behavior of bucketing sink for local 
filesystems

    ## What is the purpose of the change
    
    This PR changes the flushing behavior for HDFS' local filesystem 
abstraction. See also FLINK-9113 for more details.
    
    
    ## Brief change log
    
    - Use `hsync` for local filesystems
    - Add method to disable the new behavior
    - Additional check for verifying correct valid length files
    
    
    ## Verifying this change
    
    This fix is difficult to verify as it requires a OS process that is killed 
before syncing. I added a dedicated local filesystem test.
    
    ## Does this pull request potentially affect one of the following parts:
    
      - Dependencies (does it add or upgrade a dependency): no
      - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: no
      - The serializers: no
      - The runtime per-record code paths (performance sensitive): yes
      - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: no
      - The S3 file system connector: no
    
    ## Documentation
    
      - Does this pull request introduce a new feature? no
      - If yes, how is the feature documented? JavaDocs


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/twalthr/flink FLINK-9113

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/5811.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #5811
    
----
commit 543f206f0e9e8415468f5d1092553754a8869fc7
Author: Timo Walther <twalthr@...>
Date:   2018-04-04T08:29:57Z

    [FLINK-9113] [connectors] Fix flushing behavior of bucketing sink for local 
filesystems

----


> Data loss in BucketingSink when writing to local filesystem
> -----------------------------------------------------------
>
>                 Key: FLINK-9113
>                 URL: https://issues.apache.org/jira/browse/FLINK-9113
>             Project: Flink
>          Issue Type: Bug
>          Components: Streaming Connectors
>            Reporter: Timo Walther
>            Assignee: Timo Walther
>            Priority: Major
>
> This issue is closely related to FLINK-7737. By default the bucketing sink 
> uses HDFS's {{org.apache.hadoop.fs.FSDataOutputStream#hflush}} for 
> performance reasons. However, this leads to data loss in case of TaskManager 
> failures when writing to a local filesystem 
> {{org.apache.hadoop.fs.LocalFileSystem}}. We should use {{hsync}} by default 
> in local filesystem cases and make it possible to disable this behavior if 
> needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to