[ 
https://issues.apache.org/jira/browse/FLINK-1081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14236585#comment-14236585
 ] 

ASF GitHub Bot commented on FLINK-1081:
---------------------------------------

Github user chiwanpark commented on the pull request:

    https://github.com/apache/incubator-flink/pull/226#issuecomment-65885299
  
    I suggest a new implementation of this feature. I hope for many feedback 
about this idea. There are two functions for this feature.
    
    1. `FileMonitoringFunction` emits a tuple with 3 parameters. (modified file 
path, start offset, end offset) This function implements `NonParallelInput`.
    2. `FileMapFunction` (I think that renaming of this function is required) 
reads file that have the file path and emits contents in given range. This 
function implements `FlatMapFunction` because there is no method to link 
between two source functions.
    
    When a user calls `readFileStream` in `StreamExecutionEnvironment`, the 
system creates a `FileMonitoringFunction` and `FileMapFunction` and links them 
and returns them.
    
    With this implementation, we can fix the problem about parallelism with 
monitoring instance. The user can set degree of parallelism of source. In fact, 
the user set degree of parallelism of map function. There is only one instance 
monitoring file system.
    
    Additionally, we can reuse `FileMapFunction` to substitute 
`FileSourceFunction`.
    
    How about this implementation?


> Add HDFS file-stream source for streaming
> -----------------------------------------
>
>                 Key: FLINK-1081
>                 URL: https://issues.apache.org/jira/browse/FLINK-1081
>             Project: Flink
>          Issue Type: Improvement
>          Components: Streaming
>    Affects Versions: 0.7.0-incubating
>            Reporter: Gyula Fora
>            Assignee: Chiwan Park
>              Labels: starter
>
> Add data stream source that will monitor a slected directory on HDFS (or 
> other filesystems as well) and will process all new files created.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to