ASF GitHub Bot commented on FLINK-9113:

GitHub user twalthr opened a pull request:


    [FLINK-9113] [connectors] Fix flushing behavior of bucketing sink for local 

    ## What is the purpose of the change
    This PR changes the flushing behavior for HDFS' local filesystem 
abstraction. See also FLINK-9113 for more details.
    ## Brief change log
    - Use `hsync` for local filesystems
    - Add method to disable the new behavior
    - Additional check for verifying correct valid length files
    ## Verifying this change
    This fix is difficult to verify as it requires a OS process that is killed 
before syncing. I added a dedicated local filesystem test.
    ## Does this pull request potentially affect one of the following parts:
      - Dependencies (does it add or upgrade a dependency): no
      - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: no
      - The serializers: no
      - The runtime per-record code paths (performance sensitive): yes
      - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: no
      - The S3 file system connector: no
    ## Documentation
      - Does this pull request introduce a new feature? no
      - If yes, how is the feature documented? JavaDocs

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/twalthr/flink FLINK-9113

Alternatively you can review and apply these changes as the patch at:


To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #5811
commit 543f206f0e9e8415468f5d1092553754a8869fc7
Author: Timo Walther <twalthr@...>
Date:   2018-04-04T08:29:57Z

    [FLINK-9113] [connectors] Fix flushing behavior of bucketing sink for local 


> Data loss in BucketingSink when writing to local filesystem
> -----------------------------------------------------------
>                 Key: FLINK-9113
>                 URL: https://issues.apache.org/jira/browse/FLINK-9113
>             Project: Flink
>          Issue Type: Bug
>          Components: Streaming Connectors
>            Reporter: Timo Walther
>            Assignee: Timo Walther
>            Priority: Major
> This issue is closely related to FLINK-7737. By default the bucketing sink 
> uses HDFS's {{org.apache.hadoop.fs.FSDataOutputStream#hflush}} for 
> performance reasons. However, this leads to data loss in case of TaskManager 
> failures when writing to a local filesystem 
> {{org.apache.hadoop.fs.LocalFileSystem}}. We should use {{hsync}} by default 
> in local filesystem cases and make it possible to disable this behavior if 
> needed.

This message was sent by Atlassian JIRA

Reply via email to