[
https://issues.apache.org/jira/browse/FLINK-1081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14234679#comment-14234679
]
ASF GitHub Bot commented on FLINK-1081:
---------------------------------------
Github user gyfora commented on the pull request:
https://github.com/apache/incubator-flink/pull/226#issuecomment-65709404
You are right Robert. This behavior is unexpected at best, and we will have
to do something about it. It actually applies to other sources as well. A
central monitor would be ideal until then we could figure out some workaround.
The first thing that came to my mind is to somehow partition the incoming files
in the sources for example hash the file names. We should of course try to
respect locality for performance.
> Add HDFS file-stream source for streaming
> -----------------------------------------
>
> Key: FLINK-1081
> URL: https://issues.apache.org/jira/browse/FLINK-1081
> Project: Flink
> Issue Type: Improvement
> Components: Streaming
> Affects Versions: 0.7.0-incubating
> Reporter: Gyula Fora
> Assignee: Chiwan Park
> Labels: starter
>
> Add data stream source that will monitor a slected directory on HDFS (or
> other filesystems as well) and will process all new files created.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)