[
https://issues.apache.org/jira/browse/FLINK-2672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mohamed Amine ABDESSEMED updated FLINK-2672:
--------------------------------------------
Description:
An interesting use case of the HDFS Sink is to dispatch data into multiple
directories depending of attributes present in the source data.
For example, for some data with a timestamp and a status fields, we want to
write it into different directories using a pattern like :
/somepath/%{timestamp}/%{status}
The expected results are somethings like:
/somepath/some_timestamp/wellformed
/somepath/some_timestamp/malformed
/somepath/some_timestamp/incomplete
...
etc
To support this functionality the bucketing and checkpointing logics need to be
changed.
Note: For now, this can be done using the current version of the Rolling HDFS
Sink (https://github.com/apache/flink/pull/1084) with the help of splitting
data streams and having multiple HDFS sinks
was:
An interesting use case of the HDFS Sink is to dispatch data into multiple
directories depending of attributes present in the source data.
For example, for some data with a timestamp and a status fields, we want to
write it into different directories using a pattern like :
/somepath/%{timestamp}/%{status}
The expected results are somethings like:
/somepath/some_timestamp/wellformed
/somepath/some_timestamp/malformed
/somepath/some_timestamp/incomplete
...
etc
To support this functionality the bucketing and checkpointing logics need to be
changed.
Note: For now, this can be done using the current version of the Rolling HDFS
Sink with the help of splitting data streams and having multiple HDFS sinks.
> Add partitioned output format to HDFS RollingSink
> -------------------------------------------------
>
> Key: FLINK-2672
> URL: https://issues.apache.org/jira/browse/FLINK-2672
> Project: Flink
> Issue Type: Improvement
> Components: Streaming Connectors
> Affects Versions: 0.10
> Reporter: Mohamed Amine ABDESSEMED
> Priority: Minor
> Labels: features
>
> An interesting use case of the HDFS Sink is to dispatch data into multiple
> directories depending of attributes present in the source data.
> For example, for some data with a timestamp and a status fields, we want to
> write it into different directories using a pattern like :
> /somepath/%{timestamp}/%{status}
> The expected results are somethings like:
> /somepath/some_timestamp/wellformed
> /somepath/some_timestamp/malformed
> /somepath/some_timestamp/incomplete
> ...
> etc
> To support this functionality the bucketing and checkpointing logics need to
> be changed.
> Note: For now, this can be done using the current version of the Rolling HDFS
> Sink (https://github.com/apache/flink/pull/1084) with the help of splitting
> data streams and having multiple HDFS sinks
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)