This is an automated email from the ASF dual-hosted git repository. liyu pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/flink-web.git
commit a5fea70a283d19e9c6573797ad2beab161da9f78 Author: Jeff Zhang <[email protected]> AuthorDate: Tue Jun 2 12:50:20 2020 +0800 [blog] flink on zeppelin - part2 Co-authored-by: morsapaes <[email protected]> This closes #344. --- _posts/2020-06-23-flink-on-zeppelin-part2.md | 109 +++++++++++++++++++++ .../flink_append_mode.gif | Bin 0 -> 294307 bytes .../flink_python_udf.png | Bin 0 -> 83093 bytes .../flink_scala_udf.png | Bin 0 -> 84516 bytes .../flink_single_mode.gif | Bin 0 -> 58198 bytes .../flink_update_mode.gif | Bin 0 -> 131055 bytes 6 files changed, 109 insertions(+) diff --git a/_posts/2020-06-23-flink-on-zeppelin-part2.md b/_posts/2020-06-23-flink-on-zeppelin-part2.md new file mode 100644 index 0000000..782e74c --- /dev/null +++ b/_posts/2020-06-23-flink-on-zeppelin-part2.md @@ -0,0 +1,109 @@ +--- +layout: post +title: "Flink on Zeppelin Notebooks for Interactive Data Analysis - Part 2" +date: 2020-06-23T08:00:00.000Z +categories: ecosystem +authors: +- zjffdu: + name: "Jeff Zhang" + twitter: "zjffdu" +--- + +In a previous post, we introduced the basics of Flink on Zeppelin and how to do Streaming ETL. In this second part of the "Flink on Zeppelin" series of posts, I will share how to +perform streaming data visualization via Flink on Zeppelin and how to use Apache Flink UDFs in Zeppelin. + +# Streaming Data Visualization + +With [Zeppelin](https://zeppelin.apache.org/), you can build a real time streaming dashboard without writing any line of javascript/html/css code. + +Overall, Zeppelin supports 3 kinds of streaming data analytics: + +* Single Mode +* Update Mode +* Append Mode + +### Single Mode +Single mode is used for cases when the result of a SQL statement is always one row, such as the following example. +The output format is translated in HTML, and you can specify a paragraph local property template for the final output content template. +And you can use `{i}` as placeholder for the {i}th column of the result. + +<center> +<img src="{{ site.baseurl }}/img/blog/2020-06-23-flink-on-zeppelin-part2/flink_single_mode.gif" width="80%" alt="Single Mode"/> +</center> + +### Update Mode +Update mode is suitable for the cases when the output format is more than one row, +and will always be continuously updated. Here’s one example where we use ``GROUP BY``. + +<center> +<img src="{{ site.baseurl }}/img/blog/2020-06-23-flink-on-zeppelin-part2/flink_update_mode.gif" width="80%" alt="Update Mode"/> +</center> + +### Append Mode +Append mode is suitable for the cases when the output data is always appended. +For instance, the example below uses a tumble window. + +<center> +<img src="{{ site.baseurl }}/img/blog/2020-06-23-flink-on-zeppelin-part2/flink_append_mode.gif" width="80%" alt="Append Mode"/> +</center> + +# UDF + +SQL is a very powerful language, especially in expressing data flow. But most of the time, you need to handle complicated business logic that cannot be expressed by SQL. +In these cases UDFs (user-defined functions) come particularly handy. In Zeppelin, you can write Scala or Python UDFs, while you can also import Scala, Python and Java UDFs. +Here are 2 examples of Scala and Python UDFs: + +* Scala UDF + +```scala +%flink + +class ScalaUpper extends ScalarFunction { +def eval(str: String) = str.toUpperCase +} +btenv.registerFunction("scala_upper", new ScalaUpper()) + +``` + +* Python UDF + +```python + +%flink.pyflink + +class PythonUpper(ScalarFunction): +def eval(self, s): + return s.upper() + +bt_env.register_function("python_upper", udf(PythonUpper(), DataTypes.STRING(), DataTypes.STRING())) + +``` + +After you define the UDFs, you can use them directly in SQL: + +* Use Scala UDF in SQL + +<center> +<img src="{{ site.baseurl }}/img/blog/2020-06-23-flink-on-zeppelin-part2/flink_scala_udf.png" width="100%" alt="Scala UDF"/> +</center> + +* Use Python UDF in SQL + +<center> +<img src="{{ site.baseurl }}/img/blog/2020-06-23-flink-on-zeppelin-part2/flink_python_udf.png" width="100%" alt="Python UDF"/> +</center> + +# Summary + +In this post, we explained how to perform streaming data visualization via Flink on Zeppelin and how to use UDFs. +Besides that, you can do more in Zeppelin with Flink, such as batch processing, Hive integration and more. +You can check the following articles for more details and here's a list of [Flink on Zeppelin tutorial videos](https://www.youtube.com/watch?v=YxPo0Fosjjg&list=PL4oy12nnS7FFtg3KV1iS5vDb0pTz12VcX) for your reference. + +# References + +* [Apache Zeppelin official website](http://zeppelin.apache.org) +* Flink on Zeppelin tutorials - [Part 1](https://medium.com/@zjffdu/flink-on-zeppelin-part-1-get-started-2591aaa6aa47) +* Flink on Zeppelin tutorials - [Part 2](https://medium.com/@zjffdu/flink-on-zeppelin-part-2-batch-711731df5ad9) +* Flink on Zeppelin tutorials - [Part 3](https://medium.com/@zjffdu/flink-on-zeppelin-part-3-streaming-5fca1e16754) +* Flink on Zeppelin tutorials - [Part 4](https://medium.com/@zjffdu/flink-on-zeppelin-part-4-advanced-usage-998b74908cd9) +* [Flink on Zeppelin tutorial videos](https://www.youtube.com/watch?v=YxPo0Fosjjg&list=PL4oy12nnS7FFtg3KV1iS5vDb0pTz12VcX) diff --git a/img/blog/2020-06-23-flink-on-zeppelin-part2/flink_append_mode.gif b/img/blog/2020-06-23-flink-on-zeppelin-part2/flink_append_mode.gif new file mode 100644 index 0000000..3c827f4 Binary files /dev/null and b/img/blog/2020-06-23-flink-on-zeppelin-part2/flink_append_mode.gif differ diff --git a/img/blog/2020-06-23-flink-on-zeppelin-part2/flink_python_udf.png b/img/blog/2020-06-23-flink-on-zeppelin-part2/flink_python_udf.png new file mode 100644 index 0000000..e4caaf5 Binary files /dev/null and b/img/blog/2020-06-23-flink-on-zeppelin-part2/flink_python_udf.png differ diff --git a/img/blog/2020-06-23-flink-on-zeppelin-part2/flink_scala_udf.png b/img/blog/2020-06-23-flink-on-zeppelin-part2/flink_scala_udf.png new file mode 100644 index 0000000..4448ad1 Binary files /dev/null and b/img/blog/2020-06-23-flink-on-zeppelin-part2/flink_scala_udf.png differ diff --git a/img/blog/2020-06-23-flink-on-zeppelin-part2/flink_single_mode.gif b/img/blog/2020-06-23-flink-on-zeppelin-part2/flink_single_mode.gif new file mode 100644 index 0000000..91b49ed Binary files /dev/null and b/img/blog/2020-06-23-flink-on-zeppelin-part2/flink_single_mode.gif differ diff --git a/img/blog/2020-06-23-flink-on-zeppelin-part2/flink_update_mode.gif b/img/blog/2020-06-23-flink-on-zeppelin-part2/flink_update_mode.gif new file mode 100644 index 0000000..fe7e2e9 Binary files /dev/null and b/img/blog/2020-06-23-flink-on-zeppelin-part2/flink_update_mode.gif differ
