http://git-wip-us.apache.org/repos/asf/flink-web/blob/01ee1809/content/blog/page3/index.html ---------------------------------------------------------------------- diff --git a/content/blog/page3/index.html b/content/blog/page3/index.html index b96f0c9..350033e 100644 --- a/content/blog/page3/index.html +++ b/content/blog/page3/index.html @@ -162,7 +162,7 @@ <h2 class="blog-title"><a href="/news/2014/01/13/stratosphere-release-0.4.html">Stratosphere 0.4 Released</a></h2> <p>13 Jan 2014</p> - <p><p>We are pleased to announce that version 0.4 of the Stratosphere system has been released.</p> + <p><p>We are pleased to announce that version 0.4 of the Stratosphere system has been released. </p> </p>
http://git-wip-us.apache.org/repos/asf/flink-web/blob/01ee1809/content/community.html ---------------------------------------------------------------------- diff --git a/content/community.html b/content/community.html index fdd3c4d..15e2834 100644 --- a/content/community.html +++ b/content/community.html @@ -147,17 +147,17 @@ <div class="page-toc"> <ul id="markdown-toc"> - <li><a href="#mailing-lists" id="markdown-toc-mailing-lists">Mailing Lists</a></li> - <li><a href="#irc" id="markdown-toc-irc">IRC</a></li> - <li><a href="#stack-overflow" id="markdown-toc-stack-overflow">Stack Overflow</a></li> - <li><a href="#issue-tracker" id="markdown-toc-issue-tracker">Issue Tracker</a></li> - <li><a href="#source-code" id="markdown-toc-source-code">Source Code</a> <ul> - <li><a href="#main-source-repositories" id="markdown-toc-main-source-repositories">Main source repositories</a></li> - <li><a href="#website-repositories" id="markdown-toc-website-repositories">Website repositories</a></li> + <li><a href="#mailing-lists">Mailing Lists</a></li> + <li><a href="#irc">IRC</a></li> + <li><a href="#stack-overflow">Stack Overflow</a></li> + <li><a href="#issue-tracker">Issue Tracker</a></li> + <li><a href="#source-code">Source Code</a> <ul> + <li><a href="#main-source-repositories">Main source repositories</a></li> + <li><a href="#website-repositories">Website repositories</a></li> </ul> </li> - <li><a href="#people" id="markdown-toc-people">People</a></li> - <li><a href="#former-mentors" id="markdown-toc-former-mentors">Former mentors</a></li> + <li><a href="#people">People</a></li> + <li><a href="#former-mentors">Former mentors</a></li> </ul> </div> http://git-wip-us.apache.org/repos/asf/flink-web/blob/01ee1809/content/downloads.html ---------------------------------------------------------------------- diff --git a/content/downloads.html b/content/downloads.html index 93802b3..ce1e6a3 100644 --- a/content/downloads.html +++ b/content/downloads.html @@ -156,10 +156,10 @@ $( document ).ready(function() { <div class="page-toc"> <ul id="markdown-toc"> - <li><a href="#latest-stable-release-v081" id="markdown-toc-latest-stable-release-v081">Latest stable release (v0.8.1)</a></li> - <li><a href="#maven-dependencies" id="markdown-toc-maven-dependencies">Maven Dependencies</a></li> - <li><a href="#preview-release" id="markdown-toc-preview-release">Preview release</a></li> - <li><a href="#all-releases" id="markdown-toc-all-releases">All releases</a></li> + <li><a href="#latest-stable-release-v081">Latest stable release (v0.8.1)</a></li> + <li><a href="#maven-dependencies">Maven Dependencies</a></li> + <li><a href="#preview-release">Preview release</a></li> + <li><a href="#all-releases">All releases</a></li> </ul> </div> http://git-wip-us.apache.org/repos/asf/flink-web/blob/01ee1809/content/faq.html ---------------------------------------------------------------------- diff --git a/content/faq.html b/content/faq.html index adf8463..48e63ef 100644 --- a/content/faq.html +++ b/content/faq.html @@ -147,39 +147,39 @@ <div class="page-toc"> <ul id="markdown-toc"> - <li><a href="#general" id="markdown-toc-general">General</a> <ul> - <li><a href="#is-flink-a-hadoop-project" id="markdown-toc-is-flink-a-hadoop-project">Is Flink a Hadoop Project?</a></li> - <li><a href="#do-i-have-to-install-apache-hadoop-to-use-flink" id="markdown-toc-do-i-have-to-install-apache-hadoop-to-use-flink">Do I have to install Apache Hadoop to use Flink?</a></li> + <li><a href="#general">General</a> <ul> + <li><a href="#is-flink-a-hadoop-project">Is Flink a Hadoop Project?</a></li> + <li><a href="#do-i-have-to-install-apache-hadoop-to-use-flink">Do I have to install Apache Hadoop to use Flink?</a></li> </ul> </li> - <li><a href="#usage" id="markdown-toc-usage">Usage</a> <ul> - <li><a href="#how-do-i-assess-the-progress-of-a-flink-program" id="markdown-toc-how-do-i-assess-the-progress-of-a-flink-program">How do I assess the progress of a Flink program?</a></li> - <li><a href="#how-can-i-figure-out-why-a-program-failed" id="markdown-toc-how-can-i-figure-out-why-a-program-failed">How can I figure out why a program failed?</a></li> - <li><a href="#how-do-i-debug-flink-programs" id="markdown-toc-how-do-i-debug-flink-programs">How do I debug Flink programs?</a></li> - <li><a href="#what-is-the-parallelism-how-do-i-set-it" id="markdown-toc-what-is-the-parallelism-how-do-i-set-it">What is the parallelism? How do I set it?</a></li> + <li><a href="#usage">Usage</a> <ul> + <li><a href="#how-do-i-assess-the-progress-of-a-flink-program">How do I assess the progress of a Flink program?</a></li> + <li><a href="#how-can-i-figure-out-why-a-program-failed">How can I figure out why a program failed?</a></li> + <li><a href="#how-do-i-debug-flink-programs">How do I debug Flink programs?</a></li> + <li><a href="#what-is-the-parallelism-how-do-i-set-it">What is the parallelism? How do I set it?</a></li> </ul> </li> - <li><a href="#errors" id="markdown-toc-errors">Errors</a> <ul> - <li><a href="#why-am-i-getting-a-nonserializableexception-" id="markdown-toc-why-am-i-getting-a-nonserializableexception-">Why am I getting a âNonSerializableExceptionâ ?</a></li> - <li><a href="#in-scala-api-i-get-an-error-about-implicit-values-and-evidence-parameters" id="markdown-toc-in-scala-api-i-get-an-error-about-implicit-values-and-evidence-parameters">In Scala API, I get an error about implicit values and evidence parameters</a></li> - <li><a href="#i-get-an-error-message-saying-that-not-enough-buffers-are-available-how-do-i-fix-this" id="markdown-toc-i-get-an-error-message-saying-that-not-enough-buffers-are-available-how-do-i-fix-this">I get an error message saying that not enough buffers are available. How do I fix this?</a></li> - <li><a href="#my-job-fails-early-with-a-javaioeofexception-what-could-be-the-cause" id="markdown-toc-my-job-fails-early-with-a-javaioeofexception-what-could-be-the-cause">My job fails early with a java.io.EOFException. What could be the cause?</a></li> - <li><a href="#in-eclipse-i-get-compilation-errors-in-the-scala-projects" id="markdown-toc-in-eclipse-i-get-compilation-errors-in-the-scala-projects">In Eclipse, I get compilation errors in the Scala projects</a></li> - <li><a href="#my-program-does-not-compute-the-correct-result-why-are-my-custom-key-types" id="markdown-toc-my-program-does-not-compute-the-correct-result-why-are-my-custom-key-types">My program does not compute the correct result. Why are my custom key types</a></li> - <li><a href="#i-get-a-javalanginstantiationexception-for-my-data-type-what-is-wrong" id="markdown-toc-i-get-a-javalanginstantiationexception-for-my-data-type-what-is-wrong">I get a java.lang.InstantiationException for my data type, what is wrong?</a></li> - <li><a href="#i-cant-stop-flink-with-the-provided-stop-scripts-what-can-i-do" id="markdown-toc-i-cant-stop-flink-with-the-provided-stop-scripts-what-can-i-do">I canât stop Flink with the provided stop-scripts. What can I do?</a></li> - <li><a href="#i-got-an-outofmemoryexception-what-can-i-do" id="markdown-toc-i-got-an-outofmemoryexception-what-can-i-do">I got an OutOfMemoryException. What can I do?</a></li> - <li><a href="#why-do-the-taskmanager-log-files-become-so-huge" id="markdown-toc-why-do-the-taskmanager-log-files-become-so-huge">Why do the TaskManager log files become so huge?</a></li> + <li><a href="#errors">Errors</a> <ul> + <li><a href="#why-am-i-getting-a-nonserializableexception-">Why am I getting a âNonSerializableExceptionâ ?</a></li> + <li><a href="#in-scala-api-i-get-an-error-about-implicit-values-and-evidence-parameters">In Scala API, I get an error about implicit values and evidence parameters</a></li> + <li><a href="#i-get-an-error-message-saying-that-not-enough-buffers-are-available-how-do-i-fix-this">I get an error message saying that not enough buffers are available. How do I fix this?</a></li> + <li><a href="#my-job-fails-early-with-a-javaioeofexception-what-could-be-the-cause">My job fails early with a java.io.EOFException. What could be the cause?</a></li> + <li><a href="#in-eclipse-i-get-compilation-errors-in-the-scala-projects">In Eclipse, I get compilation errors in the Scala projects</a></li> + <li><a href="#my-program-does-not-compute-the-correct-result-why-are-my-custom-key-types">My program does not compute the correct result. Why are my custom key types</a></li> + <li><a href="#i-get-a-javalanginstantiationexception-for-my-data-type-what-is-wrong">I get a java.lang.InstantiationException for my data type, what is wrong?</a></li> + <li><a href="#i-cant-stop-flink-with-the-provided-stop-scripts-what-can-i-do">I canât stop Flink with the provided stop-scripts. What can I do?</a></li> + <li><a href="#i-got-an-outofmemoryexception-what-can-i-do">I got an OutOfMemoryException. What can I do?</a></li> + <li><a href="#why-do-the-taskmanager-log-files-become-so-huge">Why do the TaskManager log files become so huge?</a></li> </ul> </li> - <li><a href="#yarn-deployment" id="markdown-toc-yarn-deployment">YARN Deployment</a> <ul> - <li><a href="#the-yarn-session-runs-only-for-a-few-seconds" id="markdown-toc-the-yarn-session-runs-only-for-a-few-seconds">The YARN session runs only for a few seconds</a></li> - <li><a href="#the-yarn-session-crashes-with-a-hdfs-permission-exception-during-startup" id="markdown-toc-the-yarn-session-crashes-with-a-hdfs-permission-exception-during-startup">The YARN session crashes with a HDFS permission exception during startup</a></li> + <li><a href="#yarn-deployment">YARN Deployment</a> <ul> + <li><a href="#the-yarn-session-runs-only-for-a-few-seconds">The YARN session runs only for a few seconds</a></li> + <li><a href="#the-yarn-session-crashes-with-a-hdfs-permission-exception-during-startup">The YARN session crashes with a HDFS permission exception during startup</a></li> </ul> </li> - <li><a href="#features" id="markdown-toc-features">Features</a> <ul> - <li><a href="#what-kind-of-fault-tolerance-does-flink-provide" id="markdown-toc-what-kind-of-fault-tolerance-does-flink-provide">What kind of fault-tolerance does Flink provide?</a></li> - <li><a href="#are-hadoop-like-utilities-such-as-counters-and-the-distributedcache-supported" id="markdown-toc-are-hadoop-like-utilities-such-as-counters-and-the-distributedcache-supported">Are Hadoop-like utilities, such as Counters and the DistributedCache supported?</a></li> + <li><a href="#features">Features</a> <ul> + <li><a href="#what-kind-of-fault-tolerance-does-flink-provide">What kind of fault-tolerance does Flink provide?</a></li> + <li><a href="#are-hadoop-like-utilities-such-as-counters-and-the-distributedcache-supported">Are Hadoop-like utilities, such as Counters and the DistributedCache supported?</a></li> </ul> </li> </ul> @@ -384,7 +384,7 @@ cluster.sh</code>). You can kill their processes on Linux/Mac as follows:</p> <ul> <li>Determine the process id (pid) of the JobManager / TaskManager process. You can use the <code>jps</code> command on Linux(if you have OpenJDK installed) or command -<code>ps -ef | grep java</code> to find all Java processes.</li> +<code>ps -ef | grep java</code> to find all Java processes. </li> <li>Kill the process with <code>kill -9 <pid></code>, where <code>pid</code> is the process id of the affected JobManager or TaskManager process.</li> </ul> http://git-wip-us.apache.org/repos/asf/flink-web/blob/01ee1809/content/how-to-contribute.html ---------------------------------------------------------------------- diff --git a/content/how-to-contribute.html b/content/how-to-contribute.html index bb119f2..6abc544 100644 --- a/content/how-to-contribute.html +++ b/content/how-to-contribute.html @@ -147,23 +147,23 @@ <div class="page-toc"> <ul id="markdown-toc"> - <li><a href="#easy-issues-for-starters" id="markdown-toc-easy-issues-for-starters">Easy Issues for Starters</a></li> - <li><a href="#contributing-code--documentation" id="markdown-toc-contributing-code--documentation">Contributing Code & Documentation</a> <ul> - <li><a href="#setting-up-the-infrastructure-and-creating-a-pull-request" id="markdown-toc-setting-up-the-infrastructure-and-creating-a-pull-request">Setting up the Infrastructure and Creating a Pull Request</a></li> - <li><a href="#verifying-the-compliance-of-your-code" id="markdown-toc-verifying-the-compliance-of-your-code">Verifying the Compliance of your Code</a></li> + <li><a href="#easy-issues-for-starters">Easy Issues for Starters</a></li> + <li><a href="#contributing-code--documentation">Contributing Code & Documentation</a> <ul> + <li><a href="#setting-up-the-infrastructure-and-creating-a-pull-request">Setting up the Infrastructure and Creating a Pull Request</a></li> + <li><a href="#verifying-the-compliance-of-your-code">Verifying the Compliance of your Code</a></li> </ul> </li> - <li><a href="#contribute-changes-to-the-website" id="markdown-toc-contribute-changes-to-the-website">Contribute changes to the Website</a> <ul> - <li><a href="#files-and-directories-in-the-svn-repository" id="markdown-toc-files-and-directories-in-the-svn-repository">Files and Directories in the SVN repository</a></li> - <li><a href="#the-buildsh-script" id="markdown-toc-the-buildsh-script">The <code>build.sh</code> script</a></li> - <li><a href="#submit-a-patch" id="markdown-toc-submit-a-patch">Submit a patch</a></li> + <li><a href="#contribute-changes-to-the-website">Contribute changes to the Website</a> <ul> + <li><a href="#files-and-directories-in-the-svn-repository">Files and Directories in the SVN repository</a></li> + <li><a href="#the-buildsh-script">The <code>build.sh</code> script</a></li> + <li><a href="#submit-a-patch">Submit a patch</a></li> </ul> </li> - <li><a href="#how-to-become-a-committer" id="markdown-toc-how-to-become-a-committer">How to become a committer</a> <ul> - <li><a href="#how-to-use-git-as-a-committer" id="markdown-toc-how-to-use-git-as-a-committer">How to use git as a committer</a></li> + <li><a href="#how-to-become-a-committer">How to become a committer</a> <ul> + <li><a href="#how-to-use-git-as-a-committer">How to use git as a committer</a></li> </ul> </li> - <li><a href="#snapshots-nightly-builds" id="markdown-toc-snapshots-nightly-builds">Snapshots (Nightly Builds)</a></li> + <li><a href="#snapshots-nightly-builds">Snapshots (Nightly Builds)</a></li> </ul> </div> http://git-wip-us.apache.org/repos/asf/flink-web/blob/01ee1809/content/material.html ---------------------------------------------------------------------- diff --git a/content/material.html b/content/material.html index 02ecd91..618a5eb 100644 --- a/content/material.html +++ b/content/material.html @@ -145,13 +145,13 @@ <div class="page-toc"> <ul id="markdown-toc"> - <li><a href="#apache-flink-logos" id="markdown-toc-apache-flink-logos">Apache Flink Logos</a> <ul> - <li><a href="#portable-network-graphics-png" id="markdown-toc-portable-network-graphics-png">Portable Network Graphics (PNG)</a></li> - <li><a href="#scalable-vector-graphics-svg" id="markdown-toc-scalable-vector-graphics-svg">Scalable Vector Graphics (SVG)</a></li> - <li><a href="#photoshop-psd" id="markdown-toc-photoshop-psd">Photoshop (PSD)</a></li> + <li><a href="#apache-flink-logos">Apache Flink Logos</a> <ul> + <li><a href="#portable-network-graphics-png">Portable Network Graphics (PNG)</a></li> + <li><a href="#scalable-vector-graphics-svg">Scalable Vector Graphics (SVG)</a></li> + <li><a href="#photoshop-psd">Photoshop (PSD)</a></li> </ul> </li> - <li><a href="#slides" id="markdown-toc-slides">Slides</a></li> + <li><a href="#slides">Slides</a></li> </ul> </div> http://git-wip-us.apache.org/repos/asf/flink-web/blob/01ee1809/content/news/2014/01/13/stratosphere-release-0.4.html ---------------------------------------------------------------------- diff --git a/content/news/2014/01/13/stratosphere-release-0.4.html b/content/news/2014/01/13/stratosphere-release-0.4.html index 91772fb..ccd568e 100644 --- a/content/news/2014/01/13/stratosphere-release-0.4.html +++ b/content/news/2014/01/13/stratosphere-release-0.4.html @@ -145,7 +145,7 @@ <article> <p>13 Jan 2014</p> -<p>We are pleased to announce that version 0.4 of the Stratosphere system has been released.</p> +<p>We are pleased to announce that version 0.4 of the Stratosphere system has been released. </p> <p>Our team has been working hard during the last few months to create an improved and stable Stratosphere version. The new version comes with many new features, usability and performance improvements in all levels, including a new Scala API for the concise specification of programs, a Pregel-like API, support for Yarn clusters, and major performance improvements. The system features now first-class support for iterative programs and thus covers traditional analytical use cases as well as data mining and graph processing use cases with great performance.</p> @@ -173,7 +173,7 @@ Follow <a href="/docs/0.4/setup/yarn.html">our guide</a> on how to start a Strat <p>The high-level language Meteor now natively serializes JSON trees for greater performance and offers additional operators and file formats. We greatly empowered the user to write crispier scripts by adding second-order functions, multi-output operators, and other syntactical sugar. For developers of Meteor packages, the API is much more comprehensive and allows to define custom data types that can be easily embedded in JSON trees through ad-hoc byte code generation.</p> <h3 id="spargel-pregel-inspired-graph-processing">Spargel: Pregel Inspired Graph Processing</h3> -<p>Spargel is a vertex-centric API similar to the interface proposed in Googleâs Pregel paper and implemented in Apache Giraph. Spargel is implemented in 500 lines of code (including comments) on top of Stratosphereâs delta iterations feature. This confirms the flexibility of Stratosphereâs architecture.</p> +<p>Spargel is a vertex-centric API similar to the interface proposed in Googleâs Pregel paper and implemented in Apache Giraph. Spargel is implemented in 500 lines of code (including comments) on top of Stratosphereâs delta iterations feature. This confirms the flexibility of Stratosphereâs architecture. </p> <h3 id="web-frontend">Web Frontend</h3> <p>Using the new web frontend, you can monitor the progress of Stratosphere jobs. For finished jobs, the frontend shows a breakdown of the execution times for each operator. The webclient also visualizes the execution strategies chosen by the optimizer.</p> @@ -201,7 +201,7 @@ Follow <a href="/docs/0.4/setup/yarn.html">our guide</a> on how to start a Strat </ul> <h3 id="download-and-get-started-with-stratosphere-v04">Download and get started with Stratosphere v0.4</h3> -<p>There are several options for getting started with Stratosphere.</p> +<p>There are several options for getting started with Stratosphere. </p> <ul> <li>Download it on the <a href="/downloads">download page</a></li> http://git-wip-us.apache.org/repos/asf/flink-web/blob/01ee1809/content/news/2014/01/28/querying_mongodb.html ---------------------------------------------------------------------- diff --git a/content/news/2014/01/28/querying_mongodb.html b/content/news/2014/01/28/querying_mongodb.html index 6a83f33..1a345b6 100644 --- a/content/news/2014/01/28/querying_mongodb.html +++ b/content/news/2014/01/28/querying_mongodb.html @@ -147,16 +147,16 @@ <p>We recently merged a <a href="https://github.com/stratosphere/stratosphere/pull/437">pull request</a> that allows you to use any existing Hadoop <a href="http://developer.yahoo.com/hadoop/tutorial/module5.html#inputformat">InputFormat</a> with Stratosphere. So you can now (in the <code>0.5-SNAPSHOT</code> and upwards versions) define a Hadoop-based data source:</p> -<div class="highlight"><pre><code class="language-java"><span class="n">HadoopDataSource</span> <span class="n">source</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">HadoopDataSource</span><span class="o">(</span><span class="k">new</span> <span class="nf">TextInputFormat</span><span class="o">(),</span> <span class="k">new</span> <span class="nf">JobConf</span><span class="o">(),</span> <span class="s">"Input Lines"</span><span class="o">);</span> -<span class="n">TextInputFormat</span><span class="o">.</span><span class="na">addInputPath</span><span class="o">(</span><span class="n">source</span><span class="o">.</span><span class="na">getJobConf</span><span class="o">(),</span> <span class="k">new</span> <span class="nf">Path</span><span class="o">(</span><span class="n">dataInput</span><span class="o">));</span></code></pre></div> +<div class="highlight"><pre><code class="language-java"><span class="n">HadoopDataSource</span> <span class="n">source</span> <span class="o">=</span> <span class="k">new</span> <span class="n">HadoopDataSource</span><span class="o">(</span><span class="k">new</span> <span class="n">TextInputFormat</span><span class="o">(),</span> <span class="k">new</span> <span class="n">JobConf</span><span class="o">(),</span> <span class="s">"Input Lines"</span><span class="o">);</span> +<span class="n">TextInputFormat</span><span class="o">.</span><span class="na">addInputPath</span><span class="o">(</span><span class="n">source</span><span class="o">.</span><span class="na">getJobConf</span><span class="o">(),</span> <span class="k">new</span> <span class="n">Path</span><span class="o">(</span><span class="n">dataInput</span><span class="o">));</span></code></pre></div> <p>We describe in the following article how to access data stored in <a href="http://www.mongodb.org/">MongoDB</a> with Stratosphere. This allows users to join data from multiple sources (e.g. MonogDB and HDFS) or perform machine learning with the documents stored in MongoDB.</p> <p>The approach here is to use the <code>MongoInputFormat</code> that was developed for Apache Hadoop but now also runs with Stratosphere.</p> -<div class="highlight"><pre><code class="language-java"><span class="n">JobConf</span> <span class="n">conf</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">JobConf</span><span class="o">();</span> +<div class="highlight"><pre><code class="language-java"><span class="n">JobConf</span> <span class="n">conf</span> <span class="o">=</span> <span class="k">new</span> <span class="n">JobConf</span><span class="o">();</span> <span class="n">conf</span><span class="o">.</span><span class="na">set</span><span class="o">(</span><span class="s">"mongo.input.uri"</span><span class="o">,</span><span class="s">"mongodb://localhost:27017/enron_mail.messages"</span><span class="o">);</span> -<span class="n">HadoopDataSource</span> <span class="n">src</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">HadoopDataSource</span><span class="o">(</span><span class="k">new</span> <span class="nf">MongoInputFormat</span><span class="o">(),</span> <span class="n">conf</span><span class="o">,</span> <span class="s">"Read from Mongodb"</span><span class="o">,</span> <span class="k">new</span> <span class="nf">WritableWrapperConverter</span><span class="o">());</span></code></pre></div> +<span class="n">HadoopDataSource</span> <span class="n">src</span> <span class="o">=</span> <span class="k">new</span> <span class="n">HadoopDataSource</span><span class="o">(</span><span class="k">new</span> <span class="n">MongoInputFormat</span><span class="o">(),</span> <span class="n">conf</span><span class="o">,</span> <span class="s">"Read from Mongodb"</span><span class="o">,</span> <span class="k">new</span> <span class="n">WritableWrapperConverter</span><span class="o">());</span></code></pre></div> <h3 id="example-program">Example Program</h3> <p>The example program reads data from the <a href="http://www.cs.cmu.edu/~enron/">enron dataset</a> that contains about 500k internal e-mails. The data is stored in MongoDB and the Stratosphere program counts the number of e-mails per day.</p> http://git-wip-us.apache.org/repos/asf/flink-web/blob/01ee1809/content/news/2014/02/18/amazon-elastic-mapreduce-cloud-yarn.html ---------------------------------------------------------------------- diff --git a/content/news/2014/02/18/amazon-elastic-mapreduce-cloud-yarn.html b/content/news/2014/02/18/amazon-elastic-mapreduce-cloud-yarn.html index 260feb8..d89112a 100644 --- a/content/news/2014/02/18/amazon-elastic-mapreduce-cloud-yarn.html +++ b/content/news/2014/02/18/amazon-elastic-mapreduce-cloud-yarn.html @@ -215,7 +215,7 @@ ssh [email protected] -i ~/Downloads/work-laptop.pem</code></pre></div> <p>(Windows users have to follow <a href="http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-connect-master-node-ssh.html">these instructions</a> to SSH into the machine running the master.) </br></br> -Once connected to the master, download and start Stratosphere for YARN:</p> +Once connected to the master, download and start Stratosphere for YARN: </p> <ul> <li>Download and extract Stratosphere-YARN</li> @@ -227,7 +227,7 @@ tar xvzf stratosphere-dist-0.5-SNAPSHOT-yarn.tar.gz</code></pre></div> <div class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nb">cd </span>stratosphere-yarn-0.5-SNAPSHOT/ -./bin/yarn-session.sh -n <span class="m">4</span> -jm <span class="m">1024</span> -tm 3000</code></pre></div> +./bin/yarn-session.sh -n 4 -jm 1024 -tm 3000</code></pre></div> The arguments have the following meaning @@ -238,11 +238,11 @@ The arguments have the following meaning </ul> </ul> -<p>Once the output has changed from</p> +<p>Once the output has changed from </p> <div class="highlight"><pre><code class="language-bash" data-lang="bash">JobManager is now running on N/A:6123</code></pre></div> -<p>to</p> +<p>to </p> <div class="highlight"><pre><code class="language-bash" data-lang="bash">JobManager is now running on ip-172-31-13-68.us-west-2.compute.internal:6123</code></pre></div> @@ -308,7 +308,7 @@ hadoop fs -copyFromLocal gpl.txt /input</code></pre></div> <div class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="c"># optional: go to the extracted directory</span> <span class="nb">cd </span>stratosphere-yarn-0.5-SNAPSHOT/ <span class="c"># run the wordcount example</span> -./bin/stratosphere run -w -j examples/stratosphere-java-examples-0.5-SNAPSHOT-WordCount.jar -a <span class="m">16</span> hdfs:///input hdfs:///output</code></pre></div> +./bin/stratosphere run -w -j examples/stratosphere-java-examples-0.5-SNAPSHOT-WordCount.jar -a 16 hdfs:///input hdfs:///output</code></pre></div> <p>Make sure that the number of TaskManagerâs have connected to the JobManager.</p> http://git-wip-us.apache.org/repos/asf/flink-web/blob/01ee1809/content/news/2014/11/04/release-0.7.0.html ---------------------------------------------------------------------- diff --git a/content/news/2014/11/04/release-0.7.0.html b/content/news/2014/11/04/release-0.7.0.html index e25794b..5cf851b 100644 --- a/content/news/2014/11/04/release-0.7.0.html +++ b/content/news/2014/11/04/release-0.7.0.html @@ -165,7 +165,7 @@ <p><strong>Record API deprecated:</strong> The (old) Stratosphere Record API has been marked as deprecated and is planned for removal in the 0.9.0 release.</p> -<p><strong>BLOB service:</strong> This release contains a new service to distribute jar files and other binary data among the JobManager, TaskManagers and the client.</p> +<p><strong>BLOB service:</strong> This release contains a new service to distribute jar files and other binary data among the JobManager, TaskManagers and the client. </p> <p><strong>Intermediate data sets:</strong> A major rewrite of the system internals introduces intermediate data sets as first class citizens. The internal state machine that tracks the distributed tasks has also been completely rewritten for scalability. While this is not visible as a user-facing feature yet, it is the foundation for several upcoming exciting features.</p> http://git-wip-us.apache.org/repos/asf/flink-web/blob/01ee1809/content/news/2014/11/18/hadoop-compatibility.html ---------------------------------------------------------------------- diff --git a/content/news/2014/11/18/hadoop-compatibility.html b/content/news/2014/11/18/hadoop-compatibility.html index 2e01091..ccf24f2 100644 --- a/content/news/2014/11/18/hadoop-compatibility.html +++ b/content/news/2014/11/18/hadoop-compatibility.html @@ -153,7 +153,7 @@ <img src="/img/blog/hcompat-logos.png" style="width:30%;margin:15px" /> </center> -<p>To close this gap, Flink provides a Hadoop Compatibility package to wrap functions implemented against Hadoopâs MapReduce interfaces and embed them in Flink programs. This package was developed as part of a <a href="https://developers.google.com/open-source/soc/">Google Summer of Code</a> 2014 project.</p> +<p>To close this gap, Flink provides a Hadoop Compatibility package to wrap functions implemented against Hadoopâs MapReduce interfaces and embed them in Flink programs. This package was developed as part of a <a href="https://developers.google.com/open-source/soc/">Google Summer of Code</a> 2014 project. </p> <p>With the Hadoop Compatibility package, you can reuse all your Hadoop</p> @@ -166,7 +166,7 @@ <p>in Flink programs without changing a line of code. Moreover, Flink also natively supports all Hadoop data types (<code>Writables</code> and <code>WritableComparable</code>).</p> -<p>The following code snippet shows a simple Flink WordCount program that solely uses Hadoop data types, InputFormat, OutputFormat, Mapper, and Reducer functions.</p> +<p>The following code snippet shows a simple Flink WordCount program that solely uses Hadoop data types, InputFormat, OutputFormat, Mapper, and Reducer functions. </p> <div class="highlight"><pre><code class="language-java"><span class="c1">// Definition of Hadoop Mapper function</span> <span class="kd">public</span> <span class="kd">class</span> <span class="nc">Tokenizer</span> <span class="kd">implements</span> <span class="n">Mapper</span><span class="o"><</span><span class="n">LongWritable</span><span class="o">,</span> <span class="n">Text</span><span class="o">,</span> <span class="n">Text</span><span class="o">,</span> <span class="n">LongWritable</span><span class="o">></span> <span class="o">{</span> <span class="o">...</span> <span class="o">}</span> @@ -182,25 +182,25 @@ <span class="c1">// Setup Hadoopâs TextInputFormat</span> <span class="n">HadoopInputFormat</span><span class="o"><</span><span class="n">LongWritable</span><span class="o">,</span> <span class="n">Text</span><span class="o">></span> <span class="n">hadoopInputFormat</span> <span class="o">=</span> <span class="k">new</span> <span class="n">HadoopInputFormat</span><span class="o"><</span><span class="n">LongWritable</span><span class="o">,</span> <span class="n">Text</span><span class="o">>(</span> - <span class="k">new</span> <span class="nf">TextInputFormat</span><span class="o">(),</span> <span class="n">LongWritable</span><span class="o">.</span><span class="na">class</span><span class="o">,</span> <span class="n">Text</span><span class="o">.</span><span class="na">class</span><span class="o">,</span> <span class="k">new</span> <span class="nf">JobConf</span><span class="o">());</span> - <span class="n">TextInputFormat</span><span class="o">.</span><span class="na">addInputPath</span><span class="o">(</span><span class="n">hadoopInputFormat</span><span class="o">.</span><span class="na">getJobConf</span><span class="o">(),</span> <span class="k">new</span> <span class="nf">Path</span><span class="o">(</span><span class="n">inputPath</span><span class="o">));</span> + <span class="k">new</span> <span class="nf">TextInputFormat</span><span class="o">(),</span> <span class="n">LongWritable</span><span class="o">.</span><span class="na">class</span><span class="o">,</span> <span class="n">Text</span><span class="o">.</span><span class="na">class</span><span class="o">,</span> <span class="k">new</span> <span class="n">JobConf</span><span class="o">());</span> + <span class="n">TextInputFormat</span><span class="o">.</span><span class="na">addInputPath</span><span class="o">(</span><span class="n">hadoopInputFormat</span><span class="o">.</span><span class="na">getJobConf</span><span class="o">(),</span> <span class="k">new</span> <span class="n">Path</span><span class="o">(</span><span class="n">inputPath</span><span class="o">));</span> <span class="c1">// Read a DataSet with the Hadoop InputFormat</span> <span class="n">DataSet</span><span class="o"><</span><span class="n">Tuple2</span><span class="o"><</span><span class="n">LongWritable</span><span class="o">,</span> <span class="n">Text</span><span class="o">>></span> <span class="n">text</span> <span class="o">=</span> <span class="n">env</span><span class="o">.</span><span class="na">createInput</span><span class="o">(</span><span class="n">hadoopInputFormat</span><span class="o">);</span> <span class="n">DataSet</span><span class="o"><</span><span class="n">Tuple2</span><span class="o"><</span><span class="n">Text</span><span class="o">,</span> <span class="n">LongWritable</span><span class="o">>></span> <span class="n">words</span> <span class="o">=</span> <span class="n">text</span> <span class="c1">// Wrap Tokenizer Mapper function</span> - <span class="o">.</span><span class="na">flatMap</span><span class="o">(</span><span class="k">new</span> <span class="n">HadoopMapFunction</span><span class="o"><</span><span class="n">LongWritable</span><span class="o">,</span> <span class="n">Text</span><span class="o">,</span> <span class="n">Text</span><span class="o">,</span> <span class="n">LongWritable</span><span class="o">>(</span><span class="k">new</span> <span class="nf">Tokenizer</span><span class="o">()))</span> + <span class="o">.</span><span class="na">flatMap</span><span class="o">(</span><span class="k">new</span> <span class="n">HadoopMapFunction</span><span class="o"><</span><span class="n">LongWritable</span><span class="o">,</span> <span class="n">Text</span><span class="o">,</span> <span class="n">Text</span><span class="o">,</span> <span class="n">LongWritable</span><span class="o">>(</span><span class="k">new</span> <span class="n">Tokenizer</span><span class="o">()))</span> <span class="o">.</span><span class="na">groupBy</span><span class="o">(</span><span class="mi">0</span><span class="o">)</span> <span class="c1">// Wrap Counter Reducer function (used as Reducer and Combiner)</span> <span class="o">.</span><span class="na">reduceGroup</span><span class="o">(</span><span class="k">new</span> <span class="n">HadoopReduceCombineFunction</span><span class="o"><</span><span class="n">Text</span><span class="o">,</span> <span class="n">LongWritable</span><span class="o">,</span> <span class="n">Text</span><span class="o">,</span> <span class="n">LongWritable</span><span class="o">>(</span> - <span class="k">new</span> <span class="nf">Counter</span><span class="o">(),</span> <span class="k">new</span> <span class="nf">Counter</span><span class="o">()));</span> + <span class="k">new</span> <span class="nf">Counter</span><span class="o">(),</span> <span class="k">new</span> <span class="n">Counter</span><span class="o">()));</span> <span class="c1">// Setup Hadoopâs TextOutputFormat</span> <span class="n">HadoopOutputFormat</span><span class="o"><</span><span class="n">Text</span><span class="o">,</span> <span class="n">LongWritable</span><span class="o">></span> <span class="n">hadoopOutputFormat</span> <span class="o">=</span> <span class="k">new</span> <span class="n">HadoopOutputFormat</span><span class="o"><</span><span class="n">Text</span><span class="o">,</span> <span class="n">LongWritable</span><span class="o">>(</span> - <span class="k">new</span> <span class="n">TextOutputFormat</span><span class="o"><</span><span class="n">Text</span><span class="o">,</span> <span class="n">LongWritable</span><span class="o">>(),</span> <span class="k">new</span> <span class="nf">JobConf</span><span class="o">());</span> + <span class="k">new</span> <span class="n">TextOutputFormat</span><span class="o"><</span><span class="n">Text</span><span class="o">,</span> <span class="n">LongWritable</span><span class="o">>(),</span> <span class="k">new</span> <span class="n">JobConf</span><span class="o">());</span> <span class="n">hadoopOutputFormat</span><span class="o">.</span><span class="na">getJobConf</span><span class="o">().</span><span class="na">set</span><span class="o">(</span><span class="s">"mapred.textoutputformat.separator"</span><span class="o">,</span> <span class="s">" "</span><span class="o">);</span> - <span class="n">TextOutputFormat</span><span class="o">.</span><span class="na">setOutputPath</span><span class="o">(</span><span class="n">hadoopOutputFormat</span><span class="o">.</span><span class="na">getJobConf</span><span class="o">(),</span> <span class="k">new</span> <span class="nf">Path</span><span class="o">(</span><span class="n">outputPath</span><span class="o">));</span> + <span class="n">TextOutputFormat</span><span class="o">.</span><span class="na">setOutputPath</span><span class="o">(</span><span class="n">hadoopOutputFormat</span><span class="o">.</span><span class="na">getJobConf</span><span class="o">(),</span> <span class="k">new</span> <span class="n">Path</span><span class="o">(</span><span class="n">outputPath</span><span class="o">));</span> <span class="c1">// Output & Execute</span> <span class="n">words</span><span class="o">.</span><span class="na">output</span><span class="o">(</span><span class="n">hadoopOutputFormat</span><span class="o">);</span> http://git-wip-us.apache.org/repos/asf/flink-web/blob/01ee1809/content/news/2015/01/21/release-0.8.html ---------------------------------------------------------------------- diff --git a/content/news/2015/01/21/release-0.8.html b/content/news/2015/01/21/release-0.8.html index e31c131..1951ac5 100644 --- a/content/news/2015/01/21/release-0.8.html +++ b/content/news/2015/01/21/release-0.8.html @@ -196,7 +196,7 @@ <li>Stefan Bunk</li> <li>Paris Carbone</li> <li>Ufuk Celebi</li> - <li>Nils Engelbach</li> + <li>Nils Engelbach </li> <li>Stephan Ewen</li> <li>Gyula Fora</li> <li>Gabor Hermann</li> http://git-wip-us.apache.org/repos/asf/flink-web/blob/01ee1809/content/news/2015/02/04/january-in-flink.html ---------------------------------------------------------------------- diff --git a/content/news/2015/02/04/january-in-flink.html b/content/news/2015/02/04/january-in-flink.html index f39248c..7c6baa6 100644 --- a/content/news/2015/02/04/january-in-flink.html +++ b/content/news/2015/02/04/january-in-flink.html @@ -177,7 +177,7 @@ <h3 id="using-off-heap-memoryhttpsgithubcomapacheflinkpull290"><a href="https://github.com/apache/flink/pull/290">Using off-heap memory</a></h3> -<p>This pull request enables Flink to use off-heap memory for its internal memory uses (sort, hash, caching of intermediate data sets).</p> +<p>This pull request enables Flink to use off-heap memory for its internal memory uses (sort, hash, caching of intermediate data sets). </p> <h3 id="gelly-flinks-graph-apihttpsgithubcomapacheflinkpull335"><a href="https://github.com/apache/flink/pull/335">Gelly, Flinkâs Graph API</a></h3> http://git-wip-us.apache.org/repos/asf/flink-web/blob/01ee1809/content/news/2015/02/09/streaming-example.html ---------------------------------------------------------------------- diff --git a/content/news/2015/02/09/streaming-example.html b/content/news/2015/02/09/streaming-example.html index 186a07e..b5cc6c0 100644 --- a/content/news/2015/02/09/streaming-example.html +++ b/content/news/2015/02/09/streaming-example.html @@ -167,7 +167,7 @@ between the market data streams and a Twitter stream with stock mentions.</p> <p>For running the example implementation please use the <em>0.9-SNAPSHOT</em> version of Flink as a dependency. The full example code base can be -found <a href="https://github.com/apache/flink/blob/master/flink-staging/flink-streaming/flink-streaming-examples/src/main/scala/org/apache/flink/streaming/scala/examples/windowing/StockPrices.scala">here</a> in Scala and <a href="https://github.com/apache/flink/blob/master/flink-staging/flink-streaming/flink-streaming-examples/src/main/java/org/apache/flink/streaming/examples/windowing/StockPrices.java">here</a> in Java7.</p> +found <a href="https://github.com/mbalassi/flink/blob/stockprices/flink-staging/flink-streaming/flink-streaming-examples/src/main/scala/org/apache/flink/streaming/scala/examples/windowing/StockPrices.scala">here</a> in Scala and <a href="https://github.com/mbalassi/flink/blob/stockprices/flink-staging/flink-streaming/flink-streaming-examples/src/main/java/org/apache/flink/streaming/examples/windowing/StockPrices.java">here</a> in Java7.</p> <p><a href="#top"></a></p> @@ -181,7 +181,7 @@ found <a href="https://github.com/apache/flink/blob/master/flink-staging/flink-s <li>Read a socket stream of stock prices</li> <li>Parse the text in the stream to create a stream of <code>StockPrice</code> objects</li> <li>Add four other sources tagged with the stock symbol.</li> - <li>Finally, merge the streams to create a unified stream.</li> + <li>Finally, merge the streams to create a unified stream. </li> </ol> <p><img alt="Reading from multiple inputs" src="/img/blog/blog_multi_input.png" width="70%" class="img-responsive center-block" /></p> @@ -237,10 +237,10 @@ found <a href="https://github.com/apache/flink/blob/master/flink-staging/flink-s <span class="o">});</span> <span class="c1">//Generate other stock streams</span> - <span class="n">DataStream</span><span class="o"><</span><span class="n">StockPrice</span><span class="o">></span> <span class="n">SPX_stream</span> <span class="o">=</span> <span class="n">env</span><span class="o">.</span><span class="na">addSource</span><span class="o">(</span><span class="k">new</span> <span class="nf">StockSource</span><span class="o">(</span><span class="s">"SPX"</span><span class="o">,</span> <span class="mi">10</span><span class="o">));</span> - <span class="n">DataStream</span><span class="o"><</span><span class="n">StockPrice</span><span class="o">></span> <span class="n">FTSE_stream</span> <span class="o">=</span> <span class="n">env</span><span class="o">.</span><span class="na">addSource</span><span class="o">(</span><span class="k">new</span> <span class="nf">StockSource</span><span class="o">(</span><span class="s">"FTSE"</span><span class="o">,</span> <span class="mi">20</span><span class="o">));</span> - <span class="n">DataStream</span><span class="o"><</span><span class="n">StockPrice</span><span class="o">></span> <span class="n">DJI_stream</span> <span class="o">=</span> <span class="n">env</span><span class="o">.</span><span class="na">addSource</span><span class="o">(</span><span class="k">new</span> <span class="nf">StockSource</span><span class="o">(</span><span class="s">"DJI"</span><span class="o">,</span> <span class="mi">30</span><span class="o">));</span> - <span class="n">DataStream</span><span class="o"><</span><span class="n">StockPrice</span><span class="o">></span> <span class="n">BUX_stream</span> <span class="o">=</span> <span class="n">env</span><span class="o">.</span><span class="na">addSource</span><span class="o">(</span><span class="k">new</span> <span class="nf">StockSource</span><span class="o">(</span><span class="s">"BUX"</span><span class="o">,</span> <span class="mi">40</span><span class="o">));</span> + <span class="n">DataStream</span><span class="o"><</span><span class="n">StockPrice</span><span class="o">></span> <span class="n">SPX_stream</span> <span class="o">=</span> <span class="n">env</span><span class="o">.</span><span class="na">addSource</span><span class="o">(</span><span class="k">new</span> <span class="n">StockSource</span><span class="o">(</span><span class="s">"SPX"</span><span class="o">,</span> <span class="mi">10</span><span class="o">));</span> + <span class="n">DataStream</span><span class="o"><</span><span class="n">StockPrice</span><span class="o">></span> <span class="n">FTSE_stream</span> <span class="o">=</span> <span class="n">env</span><span class="o">.</span><span class="na">addSource</span><span class="o">(</span><span class="k">new</span> <span class="n">StockSource</span><span class="o">(</span><span class="s">"FTSE"</span><span class="o">,</span> <span class="mi">20</span><span class="o">));</span> + <span class="n">DataStream</span><span class="o"><</span><span class="n">StockPrice</span><span class="o">></span> <span class="n">DJI_stream</span> <span class="o">=</span> <span class="n">env</span><span class="o">.</span><span class="na">addSource</span><span class="o">(</span><span class="k">new</span> <span class="n">StockSource</span><span class="o">(</span><span class="s">"DJI"</span><span class="o">,</span> <span class="mi">30</span><span class="o">));</span> + <span class="n">DataStream</span><span class="o"><</span><span class="n">StockPrice</span><span class="o">></span> <span class="n">BUX_stream</span> <span class="o">=</span> <span class="n">env</span><span class="o">.</span><span class="na">addSource</span><span class="o">(</span><span class="k">new</span> <span class="n">StockSource</span><span class="o">(</span><span class="s">"BUX"</span><span class="o">,</span> <span class="mi">40</span><span class="o">));</span> <span class="c1">//Merge all stock streams together</span> <span class="n">DataStream</span><span class="o"><</span><span class="n">StockPrice</span><span class="o">></span> <span class="n">stockStream</span> <span class="o">=</span> <span class="n">socketStockStream</span> @@ -321,11 +321,11 @@ of this example, the data streams are simply generated using the <span class="nd">@Override</span> <span class="kd">public</span> <span class="kt">void</span> <span class="nf">invoke</span><span class="o">(</span><span class="n">Collector</span><span class="o"><</span><span class="n">StockPrice</span><span class="o">></span> <span class="n">collector</span><span class="o">)</span> <span class="kd">throws</span> <span class="n">Exception</span> <span class="o">{</span> <span class="n">price</span> <span class="o">=</span> <span class="n">DEFAULT_PRICE</span><span class="o">;</span> - <span class="n">Random</span> <span class="n">random</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">Random</span><span class="o">();</span> + <span class="n">Random</span> <span class="n">random</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Random</span><span class="o">();</span> <span class="k">while</span> <span class="o">(</span><span class="kc">true</span><span class="o">)</span> <span class="o">{</span> <span class="n">price</span> <span class="o">=</span> <span class="n">price</span> <span class="o">+</span> <span class="n">random</span><span class="o">.</span><span class="na">nextGaussian</span><span class="o">()</span> <span class="o">*</span> <span class="n">sigma</span><span class="o">;</span> - <span class="n">collector</span><span class="o">.</span><span class="na">collect</span><span class="o">(</span><span class="k">new</span> <span class="nf">StockPrice</span><span class="o">(</span><span class="n">symbol</span><span class="o">,</span> <span class="n">price</span><span class="o">));</span> + <span class="n">collector</span><span class="o">.</span><span class="na">collect</span><span class="o">(</span><span class="k">new</span> <span class="n">StockPrice</span><span class="o">(</span><span class="n">symbol</span><span class="o">,</span> <span class="n">price</span><span class="o">));</span> <span class="n">Thread</span><span class="o">.</span><span class="na">sleep</span><span class="o">(</span><span class="n">random</span><span class="o">.</span><span class="na">nextInt</span><span class="o">(</span><span class="mi">200</span><span class="o">));</span> <span class="o">}</span> <span class="o">}</span> @@ -412,7 +412,7 @@ performed on named fields of POJOs, making the code more readable.</p> <span class="n">DataStream</span><span class="o"><</span><span class="n">StockPrice</span><span class="o">></span> <span class="n">maxByStock</span> <span class="o">=</span> <span class="n">windowedStream</span><span class="o">.</span><span class="na">groupBy</span><span class="o">(</span><span class="s">"symbol"</span><span class="o">)</span> <span class="o">.</span><span class="na">maxBy</span><span class="o">(</span><span class="s">"price"</span><span class="o">).</span><span class="na">flatten</span><span class="o">();</span> <span class="n">DataStream</span><span class="o"><</span><span class="n">StockPrice</span><span class="o">></span> <span class="n">rollingMean</span> <span class="o">=</span> <span class="n">windowedStream</span><span class="o">.</span><span class="na">groupBy</span><span class="o">(</span><span class="s">"symbol"</span><span class="o">)</span> - <span class="o">.</span><span class="na">mapWindow</span><span class="o">(</span><span class="k">new</span> <span class="nf">WindowMean</span><span class="o">()).</span><span class="na">flatten</span><span class="o">();</span> + <span class="o">.</span><span class="na">mapWindow</span><span class="o">(</span><span class="k">new</span> <span class="n">WindowMean</span><span class="o">()).</span><span class="na">flatten</span><span class="o">();</span> <span class="c1">//Compute the mean of a window</span> <span class="kd">public</span> <span class="kd">final</span> <span class="kd">static</span> <span class="kd">class</span> <span class="nc">WindowMean</span> <span class="kd">implements</span> @@ -427,12 +427,12 @@ performed on named fields of POJOs, making the code more readable.</p> <span class="kd">throws</span> <span class="n">Exception</span> <span class="o">{</span> <span class="k">if</span> <span class="o">(</span><span class="n">values</span><span class="o">.</span><span class="na">iterator</span><span class="o">().</span><span class="na">hasNext</span><span class="o">())</span> <span class="o">{</span><span class="n">s</span> - <span class="nf">for</span> <span class="o">(</span><span class="n">StockPrice</span> <span class="n">sp</span> <span class="o">:</span> <span class="n">values</span><span class="o">)</span> <span class="o">{</span> + <span class="k">for</span> <span class="o">(</span><span class="n">StockPrice</span> <span class="n">sp</span> <span class="o">:</span> <span class="n">values</span><span class="o">)</span> <span class="o">{</span> <span class="n">sum</span> <span class="o">+=</span> <span class="n">sp</span><span class="o">.</span><span class="na">price</span><span class="o">;</span> <span class="n">symbol</span> <span class="o">=</span> <span class="n">sp</span><span class="o">.</span><span class="na">symbol</span><span class="o">;</span> <span class="n">count</span><span class="o">++;</span> <span class="o">}</span> - <span class="n">out</span><span class="o">.</span><span class="na">collect</span><span class="o">(</span><span class="k">new</span> <span class="nf">StockPrice</span><span class="o">(</span><span class="n">symbol</span><span class="o">,</span> <span class="n">sum</span> <span class="o">/</span> <span class="n">count</span><span class="o">));</span> + <span class="n">out</span><span class="o">.</span><span class="na">collect</span><span class="o">(</span><span class="k">new</span> <span class="n">StockPrice</span><span class="o">(</span><span class="n">symbol</span><span class="o">,</span> <span class="n">sum</span> <span class="o">/</span> <span class="n">count</span><span class="o">));</span> <span class="o">}</span> <span class="o">}</span> <span class="o">}</span></code></pre></div> @@ -492,7 +492,7 @@ every 30 seconds.</p> <div data-lang="java7"> <div class="highlight"><pre><code class="language-java" data-lang="java"><span class="kd">private</span> <span class="kd">static</span> <span class="kd">final</span> <span class="n">Double</span> <span class="n">DEFAULT_PRICE</span> <span class="o">=</span> <span class="mi">1000</span><span class="o">.;</span> -<span class="kd">private</span> <span class="kd">static</span> <span class="kd">final</span> <span class="n">StockPrice</span> <span class="n">DEFAULT_STOCK_PRICE</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">StockPrice</span><span class="o">(</span><span class="s">""</span><span class="o">,</span> <span class="n">DEFAULT_PRICE</span><span class="o">);</span> +<span class="kd">private</span> <span class="kd">static</span> <span class="kd">final</span> <span class="n">StockPrice</span> <span class="n">DEFAULT_STOCK_PRICE</span> <span class="o">=</span> <span class="k">new</span> <span class="n">StockPrice</span><span class="o">(</span><span class="s">""</span><span class="o">,</span> <span class="n">DEFAULT_PRICE</span><span class="o">);</span> <span class="c1">//Use delta policy to create price change warnings</span> <span class="n">DataStream</span><span class="o"><</span><span class="n">String</span><span class="o">></span> <span class="n">priceWarnings</span> <span class="o">=</span> <span class="n">stockStream</span><span class="o">.</span><span class="na">groupBy</span><span class="o">(</span><span class="s">"symbol"</span><span class="o">)</span> @@ -502,7 +502,7 @@ every 30 seconds.</p> <span class="k">return</span> <span class="n">Math</span><span class="o">.</span><span class="na">abs</span><span class="o">(</span><span class="n">oldDataPoint</span><span class="o">.</span><span class="na">price</span> <span class="o">-</span> <span class="n">newDataPoint</span><span class="o">.</span><span class="na">price</span><span class="o">);</span> <span class="o">}</span> <span class="o">},</span> <span class="n">DEFAULT_STOCK_PRICE</span><span class="o">))</span> -<span class="o">.</span><span class="na">mapWindow</span><span class="o">(</span><span class="k">new</span> <span class="nf">SendWarning</span><span class="o">()).</span><span class="na">flatten</span><span class="o">();</span> +<span class="o">.</span><span class="na">mapWindow</span><span class="o">(</span><span class="k">new</span> <span class="n">SendWarning</span><span class="o">()).</span><span class="na">flatten</span><span class="o">();</span> <span class="c1">//Count the number of warnings every half a minute</span> <span class="n">DataStream</span><span class="o"><</span><span class="n">Count</span><span class="o">></span> <span class="n">warningsPerStock</span> <span class="o">=</span> <span class="n">priceWarnings</span><span class="o">.</span><span class="na">map</span><span class="o">(</span><span class="k">new</span> <span class="n">MapFunction</span><span class="o"><</span><span class="n">String</span><span class="o">,</span> <span class="n">Count</span><span class="o">>()</span> <span class="o">{</span> @@ -590,7 +590,7 @@ but for the sake of this example we generate dummy tweet data.</p> <div data-lang="java7"> <div class="highlight"><pre><code class="language-java" data-lang="java"><span class="c1">//Read a stream of tweets</span> -<span class="n">DataStream</span><span class="o"><</span><span class="n">String</span><span class="o">></span> <span class="n">tweetStream</span> <span class="o">=</span> <span class="n">env</span><span class="o">.</span><span class="na">addSource</span><span class="o">(</span><span class="k">new</span> <span class="nf">TweetSource</span><span class="o">());</span> +<span class="n">DataStream</span><span class="o"><</span><span class="n">String</span><span class="o">></span> <span class="n">tweetStream</span> <span class="o">=</span> <span class="n">env</span><span class="o">.</span><span class="na">addSource</span><span class="o">(</span><span class="k">new</span> <span class="n">TweetSource</span><span class="o">());</span> <span class="c1">//Extract the stock symbols</span> <span class="n">DataStream</span><span class="o"><</span><span class="n">String</span><span class="o">></span> <span class="n">mentionedSymbols</span> <span class="o">=</span> <span class="n">tweetStream</span><span class="o">.</span><span class="na">flatMap</span><span class="o">(</span> @@ -623,8 +623,8 @@ but for the sake of this example we generate dummy tweet data.</p> <span class="nd">@Override</span> <span class="kd">public</span> <span class="kt">void</span> <span class="nf">invoke</span><span class="o">(</span><span class="n">Collector</span><span class="o"><</span><span class="n">String</span><span class="o">></span> <span class="n">collector</span><span class="o">)</span> <span class="kd">throws</span> <span class="n">Exception</span> <span class="o">{</span> - <span class="n">random</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">Random</span><span class="o">();</span> - <span class="n">stringBuilder</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">StringBuilder</span><span class="o">();</span> + <span class="n">random</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Random</span><span class="o">();</span> + <span class="n">stringBuilder</span> <span class="o">=</span> <span class="k">new</span> <span class="n">StringBuilder</span><span class="o">();</span> <span class="k">while</span> <span class="o">(</span><span class="kc">true</span><span class="o">)</span> <span class="o">{</span> <span class="n">stringBuilder</span><span class="o">.</span><span class="na">setLength</span><span class="o">(</span><span class="mi">0</span><span class="o">);</span> @@ -653,7 +653,7 @@ number of mentions of a given stock in the Twitter stream. As both of these data streams are potentially infinite, we apply the join on a 30-second window.</p> -<p><img alt="Streaming joins" src="/img/blog/blog_stream_join.png" width="60%" class="img-responsive center-block" /></p> +<p><img alt="Streaming joins" src="/img/blog/blog_stream_join.png" width="60%" class="img-responsive center-block" /> </p> <div class="codetabs"> @@ -698,7 +698,7 @@ these data streams are potentially infinite, we apply the join on a <span class="o">.</span><span class="na">equalTo</span><span class="o">(</span><span class="s">"symbol"</span><span class="o">)</span> <span class="o">.</span><span class="na">with</span><span class="o">(</span><span class="k">new</span> <span class="n">JoinFunction</span><span class="o"><</span><span class="n">Count</span><span class="o">,</span> <span class="n">Count</span><span class="o">,</span> <span class="n">Tuple2</span><span class="o"><</span><span class="n">Integer</span><span class="o">,</span> <span class="n">Integer</span><span class="o">>>()</span> <span class="o">{</span> <span class="nd">@Override</span> - <span class="kd">public</span> <span class="n">Tuple2</span><span class="o"><</span><span class="n">Integer</span><span class="o">,</span> <span class="n">Integer</span><span class="o">></span> <span class="nf">join</span><span class="o">(</span><span class="n">Count</span> <span class="n">first</span><span class="o">,</span> <span class="n">Count</span> <span class="n">second</span><span class="o">)</span> <span class="kd">throws</span> <span class="n">Exception</span> <span class="o">{</span> + <span class="kd">public</span> <span class="n">Tuple2</span><span class="o"><</span><span class="n">Integer</span><span class="o">,</span> <span class="n">Integer</span><span class="o">></span> <span class="n">join</span><span class="o">(</span><span class="n">Count</span> <span class="n">first</span><span class="o">,</span> <span class="n">Count</span> <span class="n">second</span><span class="o">)</span> <span class="kd">throws</span> <span class="n">Exception</span> <span class="o">{</span> <span class="k">return</span> <span class="k">new</span> <span class="n">Tuple2</span><span class="o"><</span><span class="n">Integer</span><span class="o">,</span> <span class="n">Integer</span><span class="o">>(</span><span class="n">first</span><span class="o">.</span><span class="na">count</span><span class="o">,</span> <span class="n">second</span><span class="o">.</span><span class="na">count</span><span class="o">);</span> <span class="o">}</span> <span class="o">});</span> @@ -706,7 +706,7 @@ these data streams are potentially infinite, we apply the join on a <span class="c1">//Compute rolling correlation</span> <span class="n">DataStream</span><span class="o"><</span><span class="n">Double</span><span class="o">></span> <span class="n">rollingCorrelation</span> <span class="o">=</span> <span class="n">tweetsAndWarning</span> <span class="o">.</span><span class="na">window</span><span class="o">(</span><span class="n">Time</span><span class="o">.</span><span class="na">of</span><span class="o">(</span><span class="mi">30</span><span class="o">,</span> <span class="n">TimeUnit</span><span class="o">.</span><span class="na">SECONDS</span><span class="o">))</span> - <span class="o">.</span><span class="na">mapWindow</span><span class="o">(</span><span class="k">new</span> <span class="nf">WindowCorrelation</span><span class="o">());</span> + <span class="o">.</span><span class="na">mapWindow</span><span class="o">(</span><span class="k">new</span> <span class="n">WindowCorrelation</span><span class="o">());</span> <span class="n">rollingCorrelation</span><span class="o">.</span><span class="na">print</span><span class="o">();</span> http://git-wip-us.apache.org/repos/asf/flink-web/blob/01ee1809/content/news/2015/03/13/peeking-into-Apache-Flinks-Engine-Room.html ---------------------------------------------------------------------- diff --git a/content/news/2015/03/13/peeking-into-Apache-Flinks-Engine-Room.html b/content/news/2015/03/13/peeking-into-Apache-Flinks-Engine-Room.html index 1523e56..737252f 100644 --- a/content/news/2015/03/13/peeking-into-Apache-Flinks-Engine-Room.html +++ b/content/news/2015/03/13/peeking-into-Apache-Flinks-Engine-Room.html @@ -152,7 +152,7 @@ <p>In this blog post, we cut through Apache Flinkâs layered architecture and take a look at its internals with a focus on how it handles joins. Specifically, I will</p> <ul> - <li>show how easy it is to join data sets using Flinkâs fluent APIs,</li> + <li>show how easy it is to join data sets using Flinkâs fluent APIs, </li> <li>discuss basic distributed join strategies, Flinkâs join implementations, and its memory management,</li> <li>talk about Flinkâs optimizer that automatically chooses join strategies,</li> <li>show some performance numbers for joining data sets of different sizes, and finally</li> @@ -163,7 +163,7 @@ <h3 id="how-do-i-join-with-flink">How do I join with Flink?</h3> -<p>Flink provides fluent APIs in Java and Scala to write data flow programs. Flinkâs APIs are centered around parallel data collections which are called data sets. data sets are processed by applying Transformations that compute new data sets. Flinkâs transformations include Map and Reduce as known from MapReduce <a href="http://research.google.com/archive/mapreduce.html">[1]</a> but also operators for joining, co-grouping, and iterative processing. The documentation gives an overview of all available transformations <a href="http://ci.apache.org/projects/flink/flink-docs-release-0.8/dataset_transformations.html">[2]</a>.</p> +<p>Flink provides fluent APIs in Java and Scala to write data flow programs. Flinkâs APIs are centered around parallel data collections which are called data sets. data sets are processed by applying Transformations that compute new data sets. Flinkâs transformations include Map and Reduce as known from MapReduce <a href="http://research.google.com/archive/mapreduce.html">[1]</a> but also operators for joining, co-grouping, and iterative processing. The documentation gives an overview of all available transformations <a href="http://ci.apache.org/projects/flink/flink-docs-release-0.8/dataset_transformations.html">[2]</a>. </p> <p>Joining two Scala case class data sets is very easy as the following example shows:</p> @@ -200,7 +200,7 @@ <ol> <li>The data of both inputs is distributed across all parallel instances that participate in the join and</li> - <li>each parallel instance performs a standard stand-alone join algorithm on its local partition of the overall data.</li> + <li>each parallel instance performs a standard stand-alone join algorithm on its local partition of the overall data. </li> </ol> <p>The distribution of data across parallel instances must ensure that each valid join pair can be locally built by exactly one instance. For both steps, there are multiple valid strategies that can be independently picked and which are favorable in different situations. In Flink terminology, the first phase is called Ship Strategy and the second phase Local Strategy. In the following I will describe Flinkâs ship and local strategies to join two data sets <em>R</em> and <em>S</em>.</p> @@ -219,7 +219,7 @@ <img src="/img/blog/joins-repartition.png" style="width:90%;margin:15px" /> </center> -<p>The Broadcast-Forward strategy sends one complete data set (R) to each parallel instance that holds a partition of the other data set (S), i.e., each parallel instance receives the full data set R. Data set S remains local and is not shipped at all. The cost of the BF strategy depends on the size of R and the number of parallel instances it is shipped to. The size of S does not matter because S is not moved. The figure below illustrates how both ship strategies work.</p> +<p>The Broadcast-Forward strategy sends one complete data set (R) to each parallel instance that holds a partition of the other data set (S), i.e., each parallel instance receives the full data set R. Data set S remains local and is not shipped at all. The cost of the BF strategy depends on the size of R and the number of parallel instances it is shipped to. The size of S does not matter because S is not moved. The figure below illustrates how both ship strategies work. </p> <center> <img src="/img/blog/joins-broadcast.png" style="width:90%;margin:15px" /> @@ -228,7 +228,7 @@ <p>The Repartition-Repartition and Broadcast-Forward ship strategies establish suitable data distributions to execute a distributed join. Depending on the operations that are applied before the join, one or even both inputs of a join are already distributed in a suitable way across parallel instances. In this case, Flink will reuse such distributions and only ship one or no input at all.</p> <h4 id="flinks-memory-management">Flinkâs Memory Management</h4> -<p>Before delving into the details of Flinkâs local join algorithms, I will briefly discuss Flinkâs internal memory management. Data processing algorithms such as joining, grouping, and sorting need to hold portions of their input data in memory. While such algorithms perform best if there is enough memory available to hold all data, it is crucial to gracefully handle situations where the data size exceeds memory. Such situations are especially tricky in JVM-based systems such as Flink because the system needs to reliably recognize that it is short on memory. Failure to detect such situations can result in an <code>OutOfMemoryException</code> and kill the JVM.</p> +<p>Before delving into the details of Flinkâs local join algorithms, I will briefly discuss Flinkâs internal memory management. Data processing algorithms such as joining, grouping, and sorting need to hold portions of their input data in memory. While such algorithms perform best if there is enough memory available to hold all data, it is crucial to gracefully handle situations where the data size exceeds memory. Such situations are especially tricky in JVM-based systems such as Flink because the system needs to reliably recognize that it is short on memory. Failure to detect such situations can result in an <code>OutOfMemoryException</code> and kill the JVM. </p> <p>Flink handles this challenge by actively managing its memory. When a worker node (TaskManager) is started, it allocates a fixed portion (70% by default) of the JVMâs heap memory that is available after initialization as 32KB byte arrays. These byte arrays are distributed as working memory to all algorithms that need to hold significant portions of data in memory. The algorithms receive their input data as Java data objects and serialize them into their working memory.</p> @@ -245,7 +245,7 @@ <p>After the data has been distributed across all parallel join instances using either a Repartition-Repartition or Broadcast-Forward ship strategy, each instance runs a local join algorithm to join the elements of its local partition. Flinkâs runtime features two common join strategies to perform these local joins:</p> <ul> - <li>the <em>Sort-Merge-Join</em> strategy (SM) and</li> + <li>the <em>Sort-Merge-Join</em> strategy (SM) and </li> <li>the <em>Hybrid-Hash-Join</em> strategy (HH).</li> </ul> @@ -290,13 +290,13 @@ <ul> <li>1GB : 1000GB</li> <li>10GB : 1000GB</li> - <li>100GB : 1000GB</li> + <li>100GB : 1000GB </li> <li>1000GB : 1000GB</li> </ul> <p>The Broadcast-Forward strategy is only executed for up to 10GB. Building a hash table from 100GB broadcasted data in 5GB working memory would result in spilling proximately 95GB (build input) + 950GB (probe input) in each parallel thread and require more than 8TB local disk storage on each machine.</p> -<p>As in the single-core benchmark, we run 1:N joins, generate the data on-the-fly, and immediately discard the result after the join. We run the benchmark on 10 n1-highmem-8 Google Compute Engine instances. Each instance is equipped with 8 cores, 52GB RAM, 40GB of which are configured as working memory (5GB per core), and one local SSD for spilling to disk. All benchmarks are performed using the same configuration, i.e., no fine tuning for the respective data sizes is done. The programs are executed with a parallelism of 80.</p> +<p>As in the single-core benchmark, we run 1:N joins, generate the data on-the-fly, and immediately discard the result after the join. We run the benchmark on 10 n1-highmem-8 Google Compute Engine instances. Each instance is equipped with 8 cores, 52GB RAM, 40GB of which are configured as working memory (5GB per core), and one local SSD for spilling to disk. All benchmarks are performed using the same configuration, i.e., no fine tuning for the respective data sizes is done. The programs are executed with a parallelism of 80. </p> <center> <img src="/img/blog/joins-dist-perf.png" style="width:70%;margin:15px" /> @@ -313,7 +313,7 @@ <ul> <li>Flinkâs fluent Scala and Java APIs make joins and other data transformations easy as cake.</li> <li>The optimizer does the hard choices for you, but gives you control in case you know better.</li> - <li>Flinkâs join implementations perform very good in-memory and gracefully degrade when going to disk.</li> + <li>Flinkâs join implementations perform very good in-memory and gracefully degrade when going to disk. </li> <li>Due to Flinkâs robust memory management, there is no need for job- or data-specific memory tuning to avoid a nasty <code>OutOfMemoryException</code>. It just runs out-of-the-box.</li> </ul>
