[jira] [Commented] (FLINK-2583) Add Stream Sink For Rolling HDFS Files

ASF GitHub Bot (JIRA) Thu, 03 Sep 2015 02:30:07 -0700

    [ 
https://issues.apache.org/jira/browse/FLINK-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14728745#comment-14728745
 ]


ASF GitHub Bot commented on FLINK-2583:
---------------------------------------

Github user rmetzger commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1084#discussion_r38627730
  
    --- Diff: docs/apis/streaming_guide.md ---
    @@ -1836,6 +1837,110 @@ More about information about Elasticsearch can be 
found [here](https://elastic.c
     
     [Back to top](#top)
     
    +### HDFS
    +
    +This connector provides a Sink that writes rolling files to HDFS. To use 
this connector, add the
    +following dependency to your project:
    +
    +{% highlight xml %}
    +<dependency>
    +  <groupId>org.apache.flink</groupId>
    +  <artifactId>flink-connector-hdfs</artifactId>
    +  <version>{{site.version}}</version>
    +</dependency>
    +{% endhighlight %}
    +
    +Note that the streaming connectors are currently not part of the binary
    +distribution. See
    
+[here](cluster_execution.html#linking-with-modules-not-contained-in-the-binary-distribution)
    +for information about how to package the program with the libraries for
    +cluster execution.
    +
    +#### HDFS Rolling File Sink
    +
    +The rolling behaviour as well as the writing can be configured but we will 
get to that later.
    +This is how you can create a default rolling sink:
    +
    +<div class="codetabs" markdown="1">
    +<div data-lang="java" markdown="1">
    +{% highlight java %}
    +DataStream<String> input = ...;
    +
    +input.addSink(new RollingHDFSSink<String>("/base/path"));
    +
    +{% endhighlight %}
    +</div>
    +<div data-lang="scala" markdown="1">
    +{% highlight scala %}
    +val input: DataStream[String] = ...
    +
    +input.addSink(new RollingHDFSSink("/base/path"))
    +
    +{% endhighlight %}
    +</div>
    +</div>
    +
    +The only required parameter is the base path in HDFS where the rolling 
files (buckets) will be
    +stored. The sink can be configured by specifying a custom bucketer, HDFS 
writer and batch size.
    +
    +By default the rolling sink will use the pattern `"yyyy-MM-dd--HH"` to 
name the rolling buckets.
    --- End diff --
    
    Can you make it a bit more explicit that a new directory is created when 
the pattern changes?


> Add Stream Sink For Rolling HDFS Files
> --------------------------------------
>
>                 Key: FLINK-2583
>                 URL: https://issues.apache.org/jira/browse/FLINK-2583
>             Project: Flink
>          Issue Type: New Feature
>          Components: Streaming
>            Reporter: Aljoscha Krettek
>            Assignee: Aljoscha Krettek
>             Fix For: 0.10
>
>
> In addition to having configurable file-rolling behavior the Sink should also 
> integrate with checkpointing to make it possible to have exactly-once 
> semantics throughout the topology.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2583) Add Stream Sink For Rolling HDFS Files

Reply via email to