This is an automated email from the ASF dual-hosted git repository. mergebot-role pushed a commit to branch mergebot in repository https://gitbox.apache.org/repos/asf/beam-site.git
commit 64b8b9d60cbe4f95c1a5bbf7f66b0499b286f5bd Author: Jean-Baptiste Onofré <[email protected]> AuthorDate: Wed Jan 10 16:39:49 2018 +0100 Add 2017 look back blog post --- src/_data/authors.yml | 8 ++ src/_posts/2018-01-09-beam-a-look-back.md | 132 ++++++++++++++++++++++++++++ src/images/blog/2017-look-back/timeline.png | Bin 0 -> 12454 bytes 3 files changed, 140 insertions(+) diff --git a/src/_data/authors.yml b/src/_data/authors.yml index d8cd836..0980cb5 100644 --- a/src/_data/authors.yml +++ b/src/_data/authors.yml @@ -47,3 +47,11 @@ jkff: name: Eugene Kirpichov email: [email protected] twitter: +jbonofre: + name: Jean-Baptiste Onofré + email: [email protected] + twitter: jbonofre +ianand: + name: Anand Iyer + email: [email protected] + twitter: diff --git a/src/_posts/2018-01-09-beam-a-look-back.md b/src/_posts/2018-01-09-beam-a-look-back.md new file mode 100644 index 0000000..d7643dc --- /dev/null +++ b/src/_posts/2018-01-09-beam-a-look-back.md @@ -0,0 +1,132 @@ +--- +layout: post +title: "Apache Beam: A Look Back at 2017" +date: 2018-01-09 00:00:01 -0800 +excerpt_separator: <!--more--> +categories: blog +authors: + - ianand + - jbonofre +--- + +On January 10, 2017, Apache Beam got [promoted]({{ site.baseurl }}/blog/2017/01/10/beam-graduates.html) +as a Top-Level Apache Software Foundation project. It was an important milestone +that validated the value of the project, legitimacy of its community, and +heralded its growing adoption. In the past year, Apache Beam has been on a +phenomenal growth trajectory, with significant growth in its community and +feature set. Let us walk you through some of the notable achievements. + +<!--more--> + +## Use cases + +First, lets take a glimpse at how Beam was used in 2017. Apache Beam being a +unified framework for batch and stream processing, enables a very wide spectrum +of diverse use cases. Here are some use cases that exemplify the versatility of +Beam. + +<img class="center-block" + src="{{ site.baseurl }}/images/blog/2017-look-back/timeline.png" + alt="Use Cases" + width="600"> + +## Community growth + +In 2017, Apache Beam had 174 contributors worldwide, from many different +organizations. As an Apache project, we are proud to count 18 PMC members and +31 committers. The community had 7 releases in 2017, each bringing a rich set of +new features and fixes. + +The most obvious and encouraging sign of the growth of Apache Beam’s community, +and validation of its core value proposition of portability, is the addition of +significant new [runners]({{ site.baseurl }}/documentation/runners/capability-matrix/) +(i.e. execution engines). We entered 2017 with Apache Flink, Apache Spark 1.x, +Google Cloud Dataflow, Apache Apex, and Apache Gearpump. In 2017, the following +new and updated runners were developed: + + - Apache Spark 2.x update + - [IBM Streams runner](https://www.ibm.com/blogs/bluemix/2017/10/streaming-analytics-updates-ibm-streams-runner-apache-beam-2-0/) + - MapReduce runner + - [JStorm runner](http://jstorm.io/) + +In addition to runners, Beam added new IO connectors, some notable ones being +the Cassandra, MQTT, AMQP, HBase/HCatalog, JDBC, Solr, Tika, Redis, and +ElasticSearch connectors. Beam’s IO connectors make it possible to read from or +write to data sources/sinks even when they are not natively supported by the +underlying execution engine. Beam also provides fully pluggable filesystem +support, allowing us to support and extend our coverage to HDFS, S3, Azure +Storage, and Google Storage. We continue to add new IO connectors and +filesystems to extend the Beam use cases. + +A particularly telling sign of the maturity of an open source community is when +it is able to collaborate with multiple other open source communities, and +mutually improve the state of the art. Over the past few months, the Beam, +Calcite, and Flink communities have come together to define a robust [spec](https://docs.google.com/document/d/1wrla8mF_mmq-NW9sdJHYVgMyZsgCmHumJJ5f5WUzTiM/edit) +for Streaming SQL, with engineers from over four organizations contributing to +it. If, like us, you are excited by the prospect of improving the state of +streaming SQL, please join us! + +In addition to SQL, new XML and JSON based declarative DSLs are also in PoC. + +## Continued innovation + +Innovation is important to the success on any open source project, and Beam has +a rich history of bringing innovative new ideas to the open source community. +Apache Beam was the first to introduce some seminal concepts in the world of +big-data processing: + + - Unified batch and streaming SDK that enables users to author big-data jobs + without having to learn multiple disparate SDKs/APIs. + - Cross-Engine Portability: Giving enterprises the confidence that workloads + authored today will not have to be re-written when open source engines become + outdated and are supplanted by newer ones. + - [Semantics](https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101) + essential for reasoning about unbounded unordered data, and achieving + consistent and correct output from a streaming job. + +In 2017, the pace of innovation continued. The following capabilities were +introduced: + + - Cross-Language Portability framework, and a [Go](https://golang.org/) SDK + developed with it. + - Dynamically Shardable IO (SplittableDoFn) + - Support for schemas in PCollection, allowing us to extend the runner + capabilities. + - Extensions addressing new use cases such as machine learning, and new data + formats. + +## Areas of improvement + +Any retrospective view of a project is incomplete without an honest assessment +of areas of improvement. Two aspects stand out: + + - Helping runners showcase their individual strengths. After all, portability + does not imply homogeneity. Different runners have different areas in which + they excel, and we need to do a better job of helping them highlight their + strengths. + - Based on the previous point, helping customers make a more informed decision + when they select a runner or migrate from one to another. + +In 2018, we aim to take proactive steps to improve the above aspects. + +## Ethos of the project and its community + +The world of batch and stream big-data processing today is reminiscent of the +[Tower of Babel](https://en.wikipedia.org/wiki/Tower_of_Babel) parable: a +slowdown of progress because different communities spoke different languages. +Similarly, today there are multiple disparate big-data SDKs/APIs, each with +their own distinct terminology to describe similar concepts. The side effect is +user confusion and slower adoption. + +The Apache Beam project aims to provide an industry standard portable SDK that +will: + + - Benefit users by providing ***innovation with stability***: The separation of + SDK and engine enables healthy competition between runners, without requiring + users to constantly learn new SDKs/APIs and rewrite their workloads to + benefit from new innovation. + - Benefit big-data engines by ***growing the pie for everyone***: Making it + easier for users to author, maintain, upgrade and migrate their big-data + workloads will lead to significant growth in the number of production + big-data deployments. + diff --git a/src/images/blog/2017-look-back/timeline.png b/src/images/blog/2017-look-back/timeline.png new file mode 100644 index 0000000..0394cd8 Binary files /dev/null and b/src/images/blog/2017-look-back/timeline.png differ -- To stop receiving notification emails like this one, please contact [email protected].
