[jira] [Updated] (FLINK-2672) Add partitioned output format to HDFS RollingSink

Mohamed Amine ABDESSEMED (JIRA) Tue, 15 Sep 2015 01:45:08 -0700

     [ 
https://issues.apache.org/jira/browse/FLINK-2672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Mohamed Amine ABDESSEMED updated FLINK-2672:
--------------------------------------------
    Description: 
An interesting use case of the HDFS Sink is to dispatch data into multiple 
directories depending of attributes present in the source data.
For example, for some data with a timestamp and a status fields, we want to 
write it into different directories using a pattern like : 
/somepath/%{timestamp}/%{status}

The expected results are somethings like: 
/somepath/some_timestamp/wellformed
/somepath/some_timestamp/malformed
/somepath/some_timestamp/incomplete 
... 
etc

To support this functionality the bucketing and checkpointing logics need to be 
changed. 

Note: For now, this can be done using the current version of the Rolling HDFS 
Sink (https://github.com/apache/flink/pull/1084) with the help of splitting 
data streams and having multiple HDFS sinks  



  was:
An interesting use case of the HDFS Sink is to dispatch data into multiple 
directories depending of attributes present in the source data.
For example, for some data with a timestamp and a status fields, we want to 
write it into different directories using a pattern like : 
/somepath/%{timestamp}/%{status}

The expected results are somethings like: 
/somepath/some_timestamp/wellformed
/somepath/some_timestamp/malformed
/somepath/some_timestamp/incomplete 
... 
etc

To support this functionality the bucketing and checkpointing logics need to be 
changed. 

Note: For now, this can be done using the current version of the Rolling HDFS 
Sink with the help of splitting data streams and having multiple HDFS sinks.



> Add partitioned output format to HDFS RollingSink
> -------------------------------------------------
>
>                 Key: FLINK-2672
>                 URL: https://issues.apache.org/jira/browse/FLINK-2672
>             Project: Flink
>          Issue Type: Improvement
>          Components: Streaming Connectors
>    Affects Versions: 0.10
>            Reporter: Mohamed Amine ABDESSEMED
>            Priority: Minor
>              Labels: features
>
> An interesting use case of the HDFS Sink is to dispatch data into multiple 
> directories depending of attributes present in the source data.
> For example, for some data with a timestamp and a status fields, we want to 
> write it into different directories using a pattern like : 
> /somepath/%{timestamp}/%{status}
> The expected results are somethings like: 
> /somepath/some_timestamp/wellformed
> /somepath/some_timestamp/malformed
> /somepath/some_timestamp/incomplete 
> ... 
> etc
> To support this functionality the bucketing and checkpointing logics need to 
> be changed. 
> Note: For now, this can be done using the current version of the Rolling HDFS 
> Sink (https://github.com/apache/flink/pull/1084) with the help of splitting 
> data streams and having multiple HDFS sinks  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (FLINK-2672) Add partitioned output format to HDFS RollingSink

Reply via email to