[jira] [Commented] (FLINK-1219) Add support for Apache Tez as execution engine

ASF GitHub Bot (JIRA) Tue, 10 Mar 2015 02:40:22 -0700

    [ 
https://issues.apache.org/jira/browse/FLINK-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14354597#comment-14354597
 ]


ASF GitHub Bot commented on FLINK-1219:
---------------------------------------

Github user rmetzger commented on a diff in the pull request:

    https://github.com/apache/flink/pull/189#discussion_r26108160
  
    --- Diff: docs/flink_on_tez_guide.md ---
    @@ -0,0 +1,293 @@
    +---
    +title: "Running Flink on YARN leveraging Tez"
    +---
    +<!--
    +Licensed to the Apache Software Foundation (ASF) under one
    +or more contributor license agreements.  See the NOTICE file
    +distributed with this work for additional information
    +regarding copyright ownership.  The ASF licenses this file
    +to you under the Apache License, Version 2.0 (the
    +"License"); you may not use this file except in compliance
    +with the License.  You may obtain a copy of the License at
    +
    +  http://www.apache.org/licenses/LICENSE-2.0
    +
    +Unless required by applicable law or agreed to in writing,
    +software distributed under the License is distributed on an
    +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    +KIND, either express or implied.  See the License for the
    +specific language governing permissions and limitations
    +under the License.
    +-->
    +
    +* This will be replaced by the TOC
    +{:toc}
    +
    +
    +<a href="#top"></a>
    +
    +## Introduction
    +
    +You can run Flink using Tez as an execution environment. Flink on Tez 
    +is currently included in *flink-staging* in alpha. All classes are
    +localted in the *org.apache.flink.tez* package.
    +
    +## Why Flink on Tez
    +
    +[Apache Tez](tez.apache.org) is a scalable data processing
    +platform. Tez provides an API for specifying a directed acyclic
    +graph (DAG), and functionality for placing the DAG vertices in YARN
    +containers, as well as data shuffling.  In Flink's architecture,
    +Tez is at about the same level as Flink's network stack. While Flink's
    +network stack focuses heavily on low latency in order to support 
    +pipelining, data streaming, and iterative algorithms, Tez
    +focuses on scalability and elastic resource usage.
    +
    +Thus, by replacing Flink's network stack with Tez, users can get 
scalability
    +and elastic resource usage in shared clusters while retaining Flink's 
    +APIs, optimizer, and runtime algorithms (local sorts, hash tables, etc).
    +
    +Flink programs can run almost unmodified using Tez as an execution
    +environment. Tez supports local execution (e.g., for debugging), and 
    +remote execution on YARN.
    +
    +
    +## Local execution
    +
    +The `LocalTezEnvironment` can be used run programs using the local
    +mode provided by Tez. This is for example WordCount using Tez local mode.
    +It is identical to a normal Flink WordCount, except that the 
`LocalTezEnvironment` is used.
    +To run in local Tez mode, you can simply run a Flink on Tez program
    +from your IDE (e.g., right click and run).
    +  
    +{% highlight java %}
    +public class WordCountExample {
    +    public static void main(String[] args) throws Exception {
    +        final LocalTezEnvironment env = LocalTezEnvironment.create();
    +
    +       DataSet<String> text = env.fromElements(
    +            "Who's there?",
    +            "I think I hear them. Stand, ho! Who's there?");
    +
    +        DataSet<Tuple2<String, Integer>> wordCounts = text
    +            .flatMap(new LineSplitter())
    +            .groupBy(0)
    +            .sum(1);
    +
    +        wordCounts.print();
    +
    +        env.execute("Word Count Example");
    +    }
    +
    +    public static class LineSplitter implements FlatMapFunction<String, 
Tuple2<String, Integer>> {
    +        @Override
    +        public void flatMap(String line, Collector<Tuple2<String, 
Integer>> out) {
    +            for (String word : line.split(" ")) {
    +                out.collect(new Tuple2<String, Integer>(word, 1));
    +            }
    +        }
    +    }
    +}
    +{% endhighlight %}
    +
    +## YARN execution
    +
    +### Setup
    +
    +- Install Tez on your Hadoop 2 cluster following the instructions from the
    +  [Apache Tez website](http://tez.apache.org/install.html). If you are 
able to run 
    +  the examples that ship with Tez, then Tez has been successfully 
installed.
    +  
    +- Currently, you need to build Flink yourself to obtain Flink on Tez
    +  (the reason is a Hadoop version compatibility: Tez releases artifacts
    +  on Maven central with a Hadoop 2.6.0 dependency). Build Flink
    +  using `mvn -DskipTests clean package -Dhadoop.version=X.X.X 
-Pinclude-tez`.
    +  Make sure that the Hadoop version matches the version that Tez uses.
    +  Obtain the jar file contained in the Flink distribution under
    +  `flink-staging/flink-tez/target/flink-tez-x.y.z-flink-fat-jar.jar` 
    +  and upload it to some directory in HDFS. E.g., to upload the file
    +  to the directory `/apps`, execute
    +  {% highlight bash %}
    +  $ hadoop fs -put /path/to/flink-tez-x.y.z-flink-fat-jar.jar /apps
    +  {% endhighlight %}  
    + 
    +- Edit the tez-site.xml configuration file, adding an entry that points to 
the
    +  location of the file. E.g., assuming that the file is in the directory 
`/apps/`, 
    +  add the following entry to tez-site.xml
    +    ~~~<property>
    +      <name>tez.aux.uris</name>
    +      
<value>${fs.default.name}/apps/flink-tez-x.y.z-flink-fat-jar.jar</value>
    +    </property>
    +    ~~~
    +    
    +- At this point, you should be able to run the pre-packaged examples, 
e.g., run WordCount as:
    +  {% highlight bash %}
    +  $ hadoop jar /path/to/flink-tez-x.y.z-flink-fat-jar.jar 
org.apache.flink.tez.ExampleDriver \
    --- End diff --
    
    The classname `org.apache.flink.tez.ExampleDriver` is wrong.


> Add support for Apache Tez as execution engine
> ----------------------------------------------
>
>                 Key: FLINK-1219
>                 URL: https://issues.apache.org/jira/browse/FLINK-1219
>             Project: Flink
>          Issue Type: New Feature
>            Reporter: Kostas Tzoumas
>            Assignee: Kostas Tzoumas
>
> This is an umbrella issue to track Apache Tez support.
> The goal is to be able to run unmodified Flink programs as Apache Tez jobs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1219) Add support for Apache Tez as execution engine

Reply via email to