[
https://issues.apache.org/jira/browse/FLINK-1081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14231531#comment-14231531
]
ASF GitHub Bot commented on FLINK-1081:
---------------------------------------
Github user chiwanpark commented on the pull request:
https://github.com/apache/incubator-flink/pull/226#issuecomment-65237135
Right. When the FileStreamFunction run multiple times in cluster, all
instance of cluster will see modified file and process it. It will create
duplicated records. I haven't tested with a cluster on YARN yet, but I have
tested with local cluster with 5 parallelism and the result is same as exactly
expected.
I think that this must be mentioned in document.
> Add HDFS file-stream source for streaming
> -----------------------------------------
>
> Key: FLINK-1081
> URL: https://issues.apache.org/jira/browse/FLINK-1081
> Project: Flink
> Issue Type: Improvement
> Components: Streaming
> Affects Versions: 0.7.0-incubating
> Reporter: Gyula Fora
> Assignee: Chiwan Park
> Labels: starter
>
> Add data stream source that will monitor a slected directory on HDFS (or
> other filesystems as well) and will process all new files created.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)