[GitHub] flink pull request #3838: [FLINK-5886] Python API for streaming applications

zentol Mon, 22 May 2017 04:59:31 -0700

Github user zentol commented on a diff in the pull request:

    https://github.com/apache/flink/pull/3838#discussion_r117722682
  
    --- Diff: docs/dev/stream/python.md ---
    @@ -0,0 +1,649 @@
    +---
    +title: "Python Programming Guide (Streaming)"
    +is_beta: true
    +nav-title: Python API
    +nav-parent_id: streaming
    +nav-pos: 63
    +---
    +<!--
    +Licensed to the Apache Software Foundation (ASF) under one
    +or more contributor license agreements.  See the NOTICE file
    +distributed with this work for additional information
    +regarding copyright ownership.  The ASF licenses this file
    +to you under the Apache License, Version 2.0 (the
    +"License"); you may not use this file except in compliance
    +with the License.  You may obtain a copy of the License at
    +
    +  http://www.apache.org/licenses/LICENSE-2.0
    +
    +Unless required by applicable law or agreed to in writing,
    +software distributed under the License is distributed on an
    +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    +KIND, either express or implied.  See the License for the
    +specific language governing permissions and limitations
    +under the License.
    +-->
    +
    +Analysis streaming programs in Flink are regular programs that implement 
transformations on
    +streaming data sets (e.g., filtering, mapping, joining, grouping). The 
streaming data sets are initially
    +created from certain sources (e.g., by reading from Apache Kafka, or 
reading files, or from collections).
    +Results are returned via sinks, which may for example write the data to 
(distributed) files, or to
    +standard output (for example the command line terminal). Flink streaming 
programs run in a variety
    +of contexts, standalone, or embedded in other programs. The execution can 
happen in a local JVM, or
    +on clusters of many machines.
    +
    +In order to create your own Flink streaming program, we encourage you to 
start with the
    +[program skeleton](#program-skeleton) and gradually add your own
    +[transformations](#transformations). The remaining sections act as 
references for additional
    +operations and advanced features.
    +
    +* This will be replaced by the TOC
    +{:toc}
    +
    +Jython Framework
    +---------------
    +Flink Python streaming API uses Jython framework (see 
<http://www.jython.org/archive/21/docs/whatis.html>)
    +to drive the execution of a given script. The Python streaming layer, is 
actually a thin wrapper layer for the
    +existing Java streaming APIs.
    +
    +#### Constraints
    +There are two main constraints for using Jython:
    +
    +* The latest Python supported version is 2.7
    +* It is not straightforward to use Python C extensions
    +
    +Streaming Program Example
    +-------------------------
    +The following streaming program is a complete, working example of 
WordCount. You can copy &amp; paste the code
    +to run it locally (see notes later in this section). It counts the number 
of each word (case insensitive)
    +in a stream of sentences, on a window size of 50 milliseconds and prints 
the results into the standard output.
    +
    +{% highlight python %}
    +from org.apache.flink.streaming.api.functions.source import SourceFunction
    +from org.apache.flink.api.common.functions import FlatMapFunction, 
ReduceFunction
    +from org.apache.flink.api.java.functions import KeySelector
    +from org.apache.flink.python.api.jython import 
PythonStreamExecutionEnvironment
    +from org.apache.flink.streaming.api.windowing.time.Time import milliseconds
    +
    +
    +class Generator(SourceFunction):
    +    def __init__(self, num_iters):
    +        self._running = True
    +        self._num_iters = num_iters
    +
    +    def run(self, ctx):
    +        counter = 0
    +        while self._running and counter < self._num_iters:
    +            ctx.collect('Hello World')
    +            counter += 1
    +
    +    def cancel(self):
    +        self._running = False
    +
    +
    +class Tokenizer(FlatMapFunction):
    +    def flatMap(self, value, collector):
    +        for word in value.lower().split():
    +            collector.collect((1, word))
    +
    +
    +class Selector(KeySelector):
    +    def getKey(self, input):
    +        return input[1]
    +
    +
    +class Sum(ReduceFunction):
    +    def reduce(self, input1, input2):
    +        count1, word1 = input1
    +        count2, word2 = input2
    +        return (count1 + count2, word1)
    +
    +def main():
    +    env = PythonStreamExecutionEnvironment.get_execution_environment()
    +    env.create_python_source(Generator(num_iters=1000)) \
    +        .flat_map(Tokenizer()) \
    +        .key_by(Selector()) \
    +        .time_window(milliseconds(50)) \
    +        .reduce(Sum()) \
    +        .print()
    +    env.execute()
    +
    +
    +if __name__ == '__main__':
    +    main()
    +{% endhighlight %}
    +
    +**Notes:**
    +
    +- If execution is done on a local cluster, you may replace the last line 
in the `main()` function
    +  with **`env.execute(True)`**
    +- Execution on a multi-node cluster requires a shared medium storage, 
which needs to be configured (.e.g HDFS)
    +  upfront.
    +- The output from of the given script is directed to the standard output. 
Consequently, the output
    +  is written to the corresponding worker `.out` filed. If the script is 
executed inside the IntelliJ IDE,
    --- End diff --
    
    type: filed -> file



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #3838: [FLINK-5886] Python API for streaming applications

Reply via email to