[ 
https://issues.apache.org/jira/browse/FLINK-1081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14231531#comment-14231531
 ] 

ASF GitHub Bot commented on FLINK-1081:
---------------------------------------

Github user chiwanpark commented on the pull request:

    https://github.com/apache/incubator-flink/pull/226#issuecomment-65237135
  
    Right. When the FileStreamFunction run multiple times in cluster, all 
instance of cluster will see modified file and process it. It will create 
duplicated records. I haven't tested with a cluster on YARN yet, but I have 
tested with local cluster with 5 parallelism and the result is same as exactly 
expected.
    
    I think that this must be mentioned in document.


> Add HDFS file-stream source for streaming
> -----------------------------------------
>
>                 Key: FLINK-1081
>                 URL: https://issues.apache.org/jira/browse/FLINK-1081
>             Project: Flink
>          Issue Type: Improvement
>          Components: Streaming
>    Affects Versions: 0.7.0-incubating
>            Reporter: Gyula Fora
>            Assignee: Chiwan Park
>              Labels: starter
>
> Add data stream source that will monitor a slected directory on HDFS (or 
> other filesystems as well) and will process all new files created.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to