This is an automated email from the ASF dual-hosted git repository.
github-bot pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/beam.git
The following commit(s) were added to refs/heads/asf-site by this push:
new 243816b869f Publishing website 2023/10/17 19:15:46 at commit 36574ce
243816b869f is described below
commit 243816b869f058c413f76760d82b51af035e9878
Author: runner <runner@main-runner-x5s5p-lghf7>
AuthorDate: Tue Oct 17 19:15:46 2023 +0000
Publishing website 2023/10/17 19:15:46 at commit 36574ce
---
website/generated-content/case-studies/index.html | 5 +-
website/generated-content/case-studies/index.xml | 260 ++++++++++++++++++---
.../case-studies/linkedin/index.html | 5 +-
.../images/case-study/linkedin/bingfeng-xia.jpg | Bin 0 -> 99417 bytes
.../images/case-study/linkedin/scheme-1.png | Bin 0 -> 72662 bytes
.../images/case-study/linkedin/scheme-2.png | Bin 0 -> 91660 bytes
.../images/case-study/linkedin/scheme-3.png | Bin 0 -> 207951 bytes
.../images/case-study/linkedin/scheme-4.png | Bin 0 -> 98569 bytes
.../images/case-study/linkedin/scheme-5.png | Bin 0 -> 116720 bytes
.../images/case-study/linkedin/scheme-6.png | Bin 0 -> 26758 bytes
.../images/case-study/linkedin/xinyu-liu.jpg | Bin 0 -> 100350 bytes
website/generated-content/index.html | 3 +-
website/generated-content/sitemap.xml | 2 +-
13 files changed, 239 insertions(+), 36 deletions(-)
diff --git a/website/generated-content/case-studies/index.html
b/website/generated-content/case-studies/index.html
index 47082be5a6c..6daac725ec3 100644
--- a/website/generated-content/case-studies/index.html
+++ b/website/generated-content/case-studies/index.html
@@ -37,7 +37,8 @@
<img class=banner-img-mobile
src=/images/banners/machine-learning/machine-learning-mobile.jpg alt="Machine
Learning"></a></div></div><div class=swiper-pagination></div></div><script
src=/js/swiper-bundle.min.min.e0e8f81b0b15728d35ff73c07f42ddbb17a108d6f23df4953cb3e60df7ade675.js></script>
<script
src=/js/sliders/top-banners.min.91104c476b3d8123ebee5ed9a8168556ec546abb698549551b38a0cee187ee1c.js></script>
<script>function showSearch(){addPlaceholder();var
e,t=document.querySelector(".searchBar");t.classList.remove("disappear"),e=document.querySelector("#iconsBar"),e.classList.add("disappear")}function
addPlaceholder(){$("input:text").attr("placeholder","What are you looking
for?")}function endSearch(){var
e,t=document.querySelector(".searchBar");t.classList.add("disappear"),e=document.querySelector("#iconsBar"),e.classList.remove("disappear")}function
blockScroll(){$("body").toggleClass(" [...]
-startups.</p><div class=case-study-list><div class=case-study-card><div
class=case-study-card-img><img src=/images/logos/powered-by/octo.png
loading=lazy></i></div><h3 class=case-study-card-title>High-Performing and
Efficient Transactional Data Processing for OCTO Technology’s Clients</h3><p
class=case-study-card-description>With Apache Beam, OCTO accelerated the
migration of one of France’s largest grocery retailers to streaming processing
for transactional data. By leveraging Apache Be [...]
+startups.</p><div class=case-study-list><div class=case-study-card><div
class=case-study-card-img><img src=/images/logos/powered-by/linkedin.png
loading=lazy></i></div><h3 class=case-study-card-title>Revolutionizing
Real-Time Stream Processing: 4 Trillion Events Daily at LinkedIn</h3><p
class=case-study-card-description>Apache Beam serves as the backbone of
LinkedIn's streaming infrastructure, handling the near real-time processing of
an astounding 4 trillion events daily through 3,000+ [...]
+<img src=/images/arrow-right.svg alt="Go to the case study"></a></div><div
class=case-study-card><div class=case-study-card-img><img
src=/images/logos/powered-by/octo.png loading=lazy></i></div><h3
class=case-study-card-title>High-Performing and Efficient Transactional Data
Processing for OCTO Technology’s Clients</h3><p
class=case-study-card-description>With Apache Beam, OCTO accelerated the
migration of one of France’s largest grocery retailers to streaming processing
for transactional [...]
<img src=/images/arrow-right.svg alt="Go to the case study"></a></div><div
class=case-study-card><div class=case-study-card-img><img
src=/images/logos/powered-by/hsbc.png loading=lazy></i></div><h3
class=case-study-card-title>High-Performance Quantitative Risk Analysis with
Apache Beam at HSBC</h3><p class=case-study-card-description>HSBC finds Apache
Beam to be more than a data processing framework. It is also a computational
platform and a risk engine that allowed for 100x scaling and [...]
<img src=/images/arrow-right.svg alt="Go to the case study"></a></div><div
class=case-study-card><div class=case-study-card-img><img
src=/images/logos/powered-by/project_shield.png loading=lazy></i></div><h3
class=case-study-card-title>Efficient Streaming Analytics: Making the Web a
Safer Place with Project Shield</h3><p
class=case-study-card-description>Project Shield defends the websites of over
3K vulnerable organizations in >150 countries against DDoS attacks with the
mission of prot [...]
<img src=/images/arrow-right.svg alt="Go to the case study"></a></div><div
class=case-study-card><div class=case-study-card-img><img
src=/images/logos/powered-by/booking.png loading=lazy></i></div><h3
class=case-study-card-title>Mass Ad Bidding With Beam at Booking.com</h3><p
class=case-study-card-description>Apache Beam powers Booking.com’s global ads
bidding and performance infrastructure, supporting 1M+ queries monthly for
workflows across multiple data systems scanning 2 PB+ of analy [...]
@@ -48,7 +49,7 @@ startups.</p><div class=case-study-list><div
class=case-study-card><div class=ca
<img src=/images/arrow-right.svg alt="Go to the case study"></a></div><div
class=case-study-card><div class=case-study-card-img><img
src=/images/logos/powered-by/hop.png loading=lazy></i></div><h3
class=case-study-card-title>Visual Apache Beam Pipeline Design and
Orchestration with Apache Hop</h3><p class=case-study-card-description>Apache
Hop is an open source data orchestration and engineering platform that extends
Apache Beam with visual pipeline lifecycle management. Neo4j’s Chief So [...]
<img src=/images/arrow-right.svg alt="Go to the case study"></a></div><div
class=case-study-card><div class=case-study-card-img><img
src=/images/logos/powered-by/seznam.png loading=lazy></i></div><h3
class=case-study-card-title>Scalability and Cost Optimization for Search
Engine's Workloads</h3><p class=case-study-card-description>Dive into the Czech
search engine’s experience of scaling the on-premises infrastructure to learn
more about the benefits of byte-based data shuffling and the [...]
<img src=/images/arrow-right.svg alt="Go to the case study"></a></div><div
class=case-study-card><div class=case-study-card-img><img
src=/images/logos/powered-by/ricardo.png loading=lazy></i></div><h3
class=case-study-card-title>Four Apache Technologies Combined for Fun and
Profit</h3><p class=case-study-card-description>Ricardo, the largest online
marketplace in Switzerland, uses Apache Beam to stream-process platform data
and enables the Data Intelligence team to provide scalable data [...]
-<img src=/images/arrow-right.svg alt="Go to the case
study"></a></div></div><div class=case-study-row-button-container><a
href=https://github.com/apache/beam/blob/master/website/ADD_CASE_STUDY.md
class=case-study-primary-button target=_blank rel="noopener noreferrer">Share
your story</a></div><h2 class=case-study-h2 id=logos>Also used by</h2><div
class="case-study-list case-study-list--additional"><a
class="case-study-used-by-card--responsive case-study-used-by-card
case-study-used-by-ca [...]
+<img src=/images/arrow-right.svg alt="Go to the case
study"></a></div></div><div class=case-study-row-button-container><a
href=https://github.com/apache/beam/blob/master/website/ADD_CASE_STUDY.md
class=case-study-primary-button target=_blank rel="noopener noreferrer">Share
your story</a></div><h2 class=case-study-h2 id=logos>Also used by</h2><div
class="case-study-list case-study-list--additional"><a
class="case-study-used-by-card--responsive case-study-used-by-card
case-study-used-by-ca [...]
<span>Add your logo</span></a></div></div><script type=text/javascript
src=/js/shuffle-elements.min.7c3e0074d9a55607c6ae854a05ff922cd14df08858b4e0c9752b5836b7c2ba38.js
defer></script></div></div><footer class=footer><div
class=footer__contained><div class=footer__cols><div class="footer__cols__col
footer__cols__col__logos"><div class=footer__cols__col__logo><img
src=/images/beam_logo_circle.svg class=footer__logo alt="Beam logo"></div><div
class=footer__cols__col__logo><img src=/images/a [...]
<a href=https://www.apache.org>The Apache Software Foundation</a>
| <a href=/privacy_policy>Privacy Policy</a>
diff --git a/website/generated-content/case-studies/index.xml
b/website/generated-content/case-studies/index.xml
index db7c8bbad24..112cfc6541a 100644
--- a/website/generated-content/case-studies/index.xml
+++ b/website/generated-content/case-studies/index.xml
@@ -142,6 +142,235 @@ Data Architect @ OCTO Technology
</div>
</div>
</div>
+<div
class="clear-nav"></div></description></item><item><title>Case-Studies:
Revolutionizing Real-Time Stream Processing: 4 Trillion Events Daily at
LinkedIn</title><link>/case-studies/linkedin/</link><pubDate>Thu, 10 Aug 2023
00:12:00 +0000</pubDate><guid>/case-studies/linkedin/</guid><description>
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+<div class="case-study-opinion">
+<div class="case-study-opinion-img">
+<img src="/images/logos/powered-by/linkedin.png"/>
+</div>
+<blockquote class="case-study-quote-block">
+<p class="case-study-quote-text">
+“Apache Beam empowers LinkedIn to create timely recommendations and
personalized experiences by leveraging the freshest data and processing it in
real-time, ultimately benefiting LinkedIn's vast network of over 950 million
members worldwide.”
+</p>
+<div class="case-study-quote-author">
+<div class="case-study-quote-author-img">
+<img src="/images/case-study/linkedin/bingfeng-xia.jpg">
+</div>
+<div class="case-study-quote-author-info">
+<div class="case-study-quote-author-name">
+Bingfeng Xia
+</div>
+<div class="case-study-quote-author-position">
+Engineering Manager @LinkedIn
+</div>
+</div>
+</div>
+</blockquote>
+</div>
+<div class="case-study-post">
+<h1
id="revolutionizing-real-time-stream-processing-4-trillion-events-daily-at-linkedin">Revolutionizing
Real-Time Stream Processing: 4 Trillion Events Daily at LinkedIn</h1>
+<h2 id="background">Background</h2>
+<p>At LinkedIn, Apache Beam plays a pivotal role in stream processing
infrastructures that process over 4 trillion events daily through more than
3,000 pipelines across multiple production data centers. This robust framework
empowers near real-time data processing for critical services and platforms,
ranging from machine learning and notifications to anti-abuse AI modeling. With
over 950 million members, ensuring that our platform is running smoothly is
critical to connecting members [...]
+<p>In this case study, LinkedIn&rsquo;s Bingfeng Xia, Engineering
Manager, and Xinyu Liu, Senior Staff Engineer, shed light on how the Apache
Beam programming model&rsquo;s unified, portable, and user-friendly data
processing framework has enabled a multitude of sophisticated use cases and
revolutionized Stream Processing at LinkedIn. This technology has <a
href="https://engineering.linkedin.com/blog/2023/unified-streaming-and-batch-pipelines-at-linkedin--reducing-proc">opt
[...]
+<h2 id="linkedin-open-source-ecosystem-and-journey-to-beam">LinkedIn
Open-Source Ecosystem and Journey to Beam</h2>
+<p>LinkedIn has a rich history of actively contributing to the open-source
community, demonstrating its commitment by creating, managing, and utilizing
various open-source software projects. The LinkedIn engineering team has <a
href="https://engineering.linkedin.com/content/engineering/en-us/open-source">open-sourced
over 75 projects</a> across multiple categories, with several gaining
widespread adoption and becoming part of <a
href="https://www.apache.org/">the Apache Softw [...]
+<p>To enable the ingestion and real-time processing of enormous volumes of
data, LinkedIn built a custom stream processing ecosystem largely with tools
developed in-house (and subsequently open-sourced). In 2010, they introduced
<a href="https://kafka.apache.org/">Apache Kafka</a>, a pivotal Big Data
ingestion backbone for LinkedIn’s real-time infrastructure. To transition from
batch-oriented processing and respond to Kafka events within minutes or
seconds, they built an in-hous [...]
+<p>Though the stream processing ecosystem with Apache Samza at its core
enabled large-scale stateful data processing, LinkedIn’s ever-evolving demands
required higher scalability and efficiency, as well as lower latency for the
streaming pipelines. The lambda architecture approach led to operational
complexity and inefficiencies, because it required maintaining two different
codebases and two different engines for batch and streaming data. To address
these challenges, data engineers s [...]
+<p>The release of <a href="/about/">Apache Beam</a> in 2016 proved to
be a game-changer for LinkedIn. Apache Beam offers an open-source, advanced
unified programming model for both batch and Stream Processing, making it
possible to create a large-scale common data infrastructure across various
applications. With support for Python, Go, and Java SDKs and a rich, versatile
API layer, Apache Beam provided the ideal solution for building sophisticated
multi-language pipelines and ru [...]
+<blockquote class="case-study-quote-block case-study-quote-wrapped">
+<p class="case-study-quote-text">
+When we started looking at Apache Beam, we realized it was a very attractive
data processing framework for LinkedIn’s demands: not only does it provide an
advanced API, but it also allows for converging stream and batch processing and
multi-language support. Everything we were looking for and out-of-the-box.
+</p>
+<div class="case-study-quote-author">
+<div class="case-study-quote-author-img">
+<img src="/images/case-study/linkedin/xinyu-liu.jpg">
+</div>
+<div class="case-study-quote-author-info">
+<div class="case-study-quote-author-name">
+Xinyu Liu
+</div>
+<div class="case-study-quote-author-position">
+Senior Staff Engineer @LinkedIn
+</div>
+</div>
+</div>
+</blockquote>
+<p>Recognizing the advantages of Apache Beam&rsquo;s unified data
processing API, advanced capabilities, and multi-language support, LinkedIn
began onboarding its first use cases and developed the <a
href="/documentation/runners/samza/">Apache Samza runner for Beam</a> in
2018. By 2019, Apache Beam pipelines were powering several critical use cases,
and the programming model and framework saw extensive adoption across LinkedIn
teams. Xinyu Liu showcased the benefits of migra [...]
+<div class="post-scheme">
+<a href="/images/case-study/linkedin/scheme-1.png" target="_blank"
title="Click to enlarge">
+<img src="/images/case-study/linkedin/scheme-1.png" alt="scheme">
+</a>
+</div>
+<h2 id="apache-beam-use-cases-at-linkedin">Apache Beam Use Cases at
LinkedIn</h2>
+<h3 id="unified-streaming-and-batch-pipelines">Unified Streaming And Batch
Pipelines</h3>
+<p>Some of the first use cases that LinkedIn migrated to Apache Beam
pipelines involved both real-time computations and periodic backfilling. One
example was LinkedIn&rsquo;s standardization process. Standardization
consists of a series of pipelines that use complex AI models to map LinkedIn
user inputs, such as job titles, skills, or education history, into predefined
internal IDs. For example, a LinkedIn member who lists their current position
as &ldquo;Chief Data Scientist& [...]
+<p>LinkedIn&rsquo;s standardization process requires both real-time
processing to reflect immediate user updates and periodic backfilling to
refresh data when new AI models are introduced. Before adopting Apache Beam,
running backfilling as a streaming job required over 5,000 GB-hours in memory
and nearly 4,000 hours in total CPU time. This heavy load led to extended
backfilling times and scaling issues, causing the backfilling pipeline to act
as a &ldquo;noisy neighbor&rd [...]
+<blockquote class="case-study-quote-block case-study-quote-wrapped">
+<p class="case-study-quote-text">
+We came to the question: is it possible to only maintain one codebase but with
the ability to run it as either a batch job or streaming job? The unified
Apache Beam model was the solution.
+</p>
+<div class="case-study-quote-author">
+<div class="case-study-quote-author-img">
+<img src="/images/case-study/linkedin/bingfeng-xia.jpg">
+</div>
+<div class="case-study-quote-author-info">
+<div class="case-study-quote-author-name">
+Bingfeng Xia
+</div>
+<div class="case-study-quote-author-position">
+Engineering Manager @LinkedIn
+</div>
+</div>
+</div>
+</blockquote>
+<p>The Apache Beam APIs enabled LinkedIn engineers to implement business
logic once within a unified Apache Beam pipeline that efficiently handles both
real-time standardization and backfilling. Apache Beam offers <a
href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/options/PipelineOptions.html">PipelineOptions</a>,
enabling the configuration and customization of various aspects, such as the
pipeline runner and runner-specific configurations. The extensi [...]
+<div class="post-scheme">
+<a href="/images/case-study/linkedin/scheme-2.png" target="_blank"
title="Click to enlarge">
+<img src="/images/case-study/linkedin/scheme-2.png" alt="scheme">
+</a>
+</div>
+<p>Hundreds of streaming Apache Beam jobs now power real-time
standardization, listening to events 24/7, enriching streams with additional
data from remote tables, performing necessary processing, and writing results
to output databases. The batch Apache Beam backfilling job runs weekly,
effectively handling 950 million member profiles at a rate of over 40,000
profiles per second. Apache Beam infers data points into sophisticated AI and
machine learning models and joins complex data s [...]
+<p>The migration of backfilling logic to a unified Apache Beam pipeline and
its execution in batch mode resulted in a significant 50% improvement in memory
and CPU usage efficiency (from ~5000 GB-hours and ~4000 CPU hours to ~2000
GB-hours and ~1700 CPU hours) and an impressive 94% acceleration in processing
time (from 7.5 hours to 25 minutes). More details about this use case can be
found on <a
href="https://engineering.linkedin.com/blog/2023/unified-streaming-and-batch-pipelines-
[...]
+<h3 id="anti-abuse--near-real-time-ai-modeling">Anti-Abuse &amp; Near
Real-Time AI Modeling</h3>
+<p>LinkedIn is firmly committed to creating a trusted environment for its
members, and this dedication extends to safeguarding against various types of
abuse on the platform. To achieve this, the Anti-Abuse AI Team at LinkedIn
plays a crucial role in creating, deploying, and maintaining AI and deep
learning models that can detect and prevent different forms of abuse, such as
fake account creation, member profile scraping, automated spam, and account
takeovers.</p>
+<p>Apache Beam fortifies LinkedIn’s internal anti-abuse platform, Chronos,
enabling abuse detection and prevention in near real-time. Chronos relies on
two streaming Apache Beam pipelines: the Filter pipeline and the Model
pipeline. The Filter pipeline reads user activity events from Kafka, extracts
relevant fields, aggregates and filters the events, and then generates filtered
Kafka messages for downstream AI processing. Subsequently, the Model pipeline
consumes these filtered messag [...]
+<div class="post-scheme">
+<a href="/images/case-study/linkedin/scheme-3.png" target="_blank"
title="Click to enlarge">
+<img src="/images/case-study/linkedin/scheme-3.png" alt="scheme">
+</a>
+</div>
+<p>The flexibility of Apache Beam&rsquo;s pluggable architecture and
the availability of various I/O options seamlessly integrated the anti-abuse
pipelines with Kafka and key-value stores. LinkedIn has dramatically reduced
the time it takes to label abusive actions, cutting it down from 1 day to just
5 minutes and processing time-series events at an impressive rate of over 3
million queries per second. Apache Beam empowered near real-time processing,
significantly bolstering Linke [...]
+<blockquote class="case-study-quote-block case-study-quote-wrapped">
+<p class="case-study-quote-text">
+Apache Beam enabled revolutionary, phenomenal performance improvements - the
anti-abuse processing accelerated from 1 day to 5 minutes. We have seen more
than 6% improvement in detecting logged-in scrapping profiles.
+</p>
+<div class="case-study-quote-author">
+<div class="case-study-quote-author-img">
+<img src="/images/case-study/linkedin/xinyu-liu.jpg">
+</div>
+<div class="case-study-quote-author-info">
+<div class="case-study-quote-author-name">
+Xinyu Liu
+</div>
+<div class="case-study-quote-author-position">
+Senior Staff Engineer @LinkedIn
+</div>
+</div>
+</div>
+</blockquote>
+<h3 id="notifications-platform">Notifications Platform</h3>
+<p>As a social media network, LinkedIn heavily relies on instant
notifications to drive member engagement. To achieve this, Apache Beam and
Apache Samza together power LinkedIn’s large-scale Notifications Platform that
generates notification content, pinpoints the target audience, and ensures the
timely and relevant distribution of content.</p>
+<p>The streaming Apache Beam pipelines have intricate business logic and
handle enormous volumes of data in a near real-time fashion. The pipelines
consume, aggregate, partition, and process events from over 950 million
LinkedIn members and feed the data to downstream machine learning models. The
ML models perform distributed targeting and scalable scoring on the order of
millions of candidate notifications per second based on the recipient member’s
historical actions and make persona [...]
+<p>The advanced Apache Beam API offers complex aggregation and filtering
capabilities out-of-the-box, and its programming model allows for the creation
of reusable components. These features enable LinkedIn to expedite development
and streamline the scaling of the Notifications platform as they transition
more notification use cases from Samza to Beam pipelines.</p>
+<blockquote class="case-study-quote-block case-study-quote-wrapped">
+<p class="case-study-quote-text">
+LinkedIn’s user engagement is greatly driven by how timely we can send
relevant notifications. Apache Beam enabled a scalable, near real-time
infrastructure behind this business-critical use case.
+</p>
+<div class="case-study-quote-author">
+<div class="case-study-quote-author-img">
+<img src="/images/case-study/linkedin/bingfeng-xia.jpg">
+</div>
+<div class="case-study-quote-author-info">
+<div class="case-study-quote-author-name">
+Bingfeng Xia
+</div>
+<div class="case-study-quote-author-position">
+Engineering Manager @LinkedIn
+</div>
+</div>
+</div>
+</blockquote>
+<h3 id="real-time-ml-feature-generation">Real-Time ML Feature
Generation</h3>
+<p>LinkedIn&rsquo;s core functionalities, such as job recommendations
and search feed, heavily rely on ML models that consume thousands of features
related to various entities like companies, job postings, and members. However,
before the adoption of Apache Beam, the original offline ML feature generation
pipeline suffered from a delay of 24 to 48 hours between member actions and the
impact of those actions on the recommendation system. This delay resulted in
missed opportunities, [...]
+<p>Using Managed Beam as the foundation, LinkedIn developed a hosted
platform for ML feature generation. The ML platform provides AI engineers with
real-time features and an efficient pipeline authoring experience, all while
abstracting away deployment and operational complexities. AI engineers create
feature definitions and deploy them using Managed Beam. When LinkedIn members
take actions on the platform, the streaming Apache Beam pipeline generates
fresher machine learning features [...]
+<div class="post-scheme">
+<a href="/images/case-study/linkedin/scheme-4.png" target="_blank"
title="Click to enlarge">
+<img src="/images/case-study/linkedin/scheme-4.png" alt="scheme">
+</a>
+</div>
+<p>The powerful Apache Beam Stream Processing platform played a pivotal
role in eliminating the delay between member actions and data availability,
achieving an impressive end-to-end pipeline latency of just a few seconds. This
significant improvement allowed LinkedIn&rsquo;s ML models to take
advantage of up-to-date information and deliver more personalized and timely
recommendations to our members, leading to significant gains in business
metrics.</p>
+<h3 id="managed-stream-processing-platform">Managed Stream Processing
Platform</h3>
+<p>As LinkedIn&rsquo;s data infrastructure grew to encompass over 3,000
Apache Beam pipelines, catering to a diverse range of business use cases,
LinkedIn&rsquo;s AI and data engineering teams found themselves overwhelmed
with managing these streaming applications 24/7. The AI engineers encountered
several technical challenges while creating new pipelines, including the
intricacy of integrating multiple streaming tools and infrastructures into
their frameworks, and limited kno [...]
+<div class="post-scheme">
+<a href="/images/case-study/linkedin/scheme-5.png" target="_blank"
title="Click to enlarge">
+<img src="/images/case-study/linkedin/scheme-5.png" alt="scheme">
+</a>
+</div>
+<p>The Apache Beam SDK empowered LinkedIn engineers to create custom
workflow components as reusable sub-DAGs (Directed Acyclic Graphs) and expose
them as standard PTransforms. These PTransforms serve as ready-to-use building
blocks for new pipelines, significantly speeding up the authoring and testing
process for LinkedIn AI engineers. By abstracting the low-level details of
underlying engines and runtime environments, Apache Beam allows engineers to
focus solely on business logic, f [...]
+<p>When the pipelines are ready for deployment, Managed Beam&rsquo;s
central control plane comes into play, providing essential features like a
deployment UI, operational dashboard, administrative tools, and automated
pipeline lifecycle management.</p>
+<p>Apache Beam&rsquo;s abstraction facilitated the isolation of user
code from framework evolution during build, deployment, and runtime. To ensure
the separation of runner processes from user-defined functions (UDFs), Managed
Beam packages the pipeline business logic and the framework logic as two
separate JAR files: framework-less artifacts and framework artifacts. During
pipeline execution on a YARN cluster, these pipeline artifacts run in a Samza
container as two distinct proc [...]
+<div class="post-scheme">
+<a href="/images/case-study/linkedin/scheme-6.png" target="_blank"
title="Click to enlarge">
+<img src="/images/case-study/linkedin/scheme-6.png" alt="scheme">
+</a>
+</div>
+<p>Apache Beam also underpinned Managed Beam&rsquo;s autosizing
controller tool, which automates hardware resource tuning and provides
auto-remediation for streaming pipelines. Streaming Apache Beam pipelines
self-report diagnostic information, such as metrics and key deployment logs, in
the form of Kafka topics. Additionally, LinkedIn&rsquo;s internal
monitoring tools report runtime errors, such as heartbeat failures,
out-of-memory events, and processing lags. The Apache Beam [...]
+<blockquote class="case-study-quote-block case-study-quote-wrapped">
+<p class="case-study-quote-text">
+Apache Beam helped streamline operations management and enabled
fully-automated autoscaling, significantly reducing the time to onboard new
applications. Previously, onboarding required a lot of manual 'trial and error'
iterations and deep knowledge of the internal system and metrics.
+</p>
+<div class="case-study-quote-author">
+<div class="case-study-quote-author-img">
+<img src="/images/case-study/linkedin/bingfeng-xia.jpg">
+</div>
+<div class="case-study-quote-author-info">
+<div class="case-study-quote-author-name">
+Bingfeng Xia
+</div>
+<div class="case-study-quote-author-position">
+Engineering Manager @LinkedIn
+</div>
+</div>
+</div>
+</blockquote>
+<p>The extensibility, pluggability, portability, and abstraction of Apache
Beam formed the backbone of LinkedIn&rsquo;s Managed Beam platform. The
Managed Beam platform accelerated the time to author, test, and stabilize
streaming pipelines from months to days, facilitated fast experimentation, and
almost entirely eliminated operational costs for AI engineers.</p>
+<h2 id="summary">Summary</h2>
+<p>Apache Beam played a pivotal role in revolutionizing and scaling
LinkedIn&rsquo;s data infrastructure. Beam&rsquo;s powerful streaming
capabilities enable real-time processing for critical business use cases, at a
scale of over 4 trillion events daily through more than 3,000 pipelines.</p>
+<p>The versatility of Apache Beam empowered LinkedIn’s engineering teams to
optimize their data processing for various business use cases:</p>
+<ul>
+<li>Apache Beam&rsquo;s unified and portable framework allowed LinkedIn
to consolidate streaming and batch processing into unified pipelines. These
unified pipelines resulted in a 2x optimization in cost-to-serve, a 2x
improvement in processing performance, and a 2x improvement in memory and CPU
usage efficiency.</li>
+<li>LinkedIn&rsquo;s anti-abuse platform leveraged Apache Beam to
process user activity events from Kafka in near-real-time, achieving a
remarkable acceleration from days to minutes in labeling abusive actions. The
nearline defenses are able to catch scrapers within minutes after they start to
scrape and this leads to more than 6% improvement in detecting logged-in
scrapping profiles.</li>
+<li>By adopting Apache Beam, LinkedIn was able to transition from an
offline ML feature generation pipeline with a 24- to 48-hour delay to a
real-time platform with an end-to-end pipeline latency at the millisecond or
second level.</li>
+<li>Apache Beam’s abstraction and powerful programming model enabled
LinkedIn to create a fully managed stream processing platform, thus
facilitating easier authoring, testing, and deployment and accelerating
time-to-production for new pipelines from months to days.</li>
+</ul>
+<p>Apache Beam boasts seamless plug-and-play capabilities, integrating
smoothly with Apache Kafka, Apache Pinot, and other core technologies at
LinkedIn, all while ensuring optimal performance at scale. As LinkedIn
continues experimenting with new engines and tooling, the Apache Beam
portability future-proofs our ecosystem against any changes in the underlying
infrastructure.</p>
+<blockquote class="case-study-quote-block case-study-quote-wrapped">
+<p class="case-study-quote-text">
+By enabling a scalable, near real-time infrastructure behind business-critical
use cases, Apache Beam empowers LinkedIn to leverage the freshest data and
process it in real-time to create timely recommendations and personalized
experiences, ultimately benefiting LinkedIn's vast network of over 950 million
members worldwide.
+</p>
+<div class="case-study-quote-author">
+<div class="case-study-quote-author-img">
+<img src="/images/case-study/linkedin/xinyu-liu.jpg">
+</div>
+<div class="case-study-quote-author-info">
+<div class="case-study-quote-author-name">
+Xinyu Liu
+</div>
+<div class="case-study-quote-author-position">
+Senior Staff Engineer @LinkedIn
+</div>
+</div>
+</div>
+</blockquote>
+<p><br><br></p>
+<div class="case-study-feedback" id="case-study-feedback">
+<p class="case-study-feedback-title">Was this information useful?</p>
+<div>
+<button class="btn case-study-feedback-btn"
onclick="sendCaseStudyFeedback(true, 'LinkedIn')">Yes</button>
+<button class="btn case-study-feedback-btn"
onclick="sendCaseStudyFeedback(false, 'LinkedIn')">No</button>
+</div>
+</div>
+</div>
<div
class="clear-nav"></div></description></item><item><title>Case-Studies:
High-Performance Quantitative Risk Analysis with Apache Beam at
HSBC</title><link>/case-studies/hsbc/</link><pubDate>Tue, 20 Jun 2023 00:12:00
+0000</pubDate><guid>/case-studies/hsbc/</guid><description>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
@@ -2453,33 +2682,4 @@ distributed under the License is distributed on an "AS
IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
---></description></item><item><title>Case-Studies:
Kio</title><link>/case-studies/kio/</link><pubDate>Mon, 01 Jan 0001 00:00:00
+0000</pubDate><guid>/case-studies/kio/</guid><description>
-<!--
-Licensed under the Apache License, Version 2.0 (the "License");
-you may not use this file except in compliance with the License.
-You may obtain a copy of the License at
-http://www.apache.org/licenses/LICENSE-2.0
-Unless required by applicable law or agreed to in writing, software
-distributed under the License is distributed on an "AS IS" BASIS,
-WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-See the License for the specific language governing permissions and
-limitations under the License.
--->
-<div class="case-study-post">
-<h1
id="kio-is-a-set-of-kotlin-extensions-for-apache-beam-to-implement-fluent-like-api-for-java-sdk">Kio
is a set of Kotlin extensions for Apache Beam to implement fluent-like API for
Java SDK.</h1>
-<h2 id="word-count-example">Word Count example</h2>
-<pre tabindex="0"><code>// Create Kio context
-val kio = Kio.fromArguments(args)
-// Configure a pipeline
-kio.read().text(&#34;~/input.txt&#34;)
-.map { it.toLowerCase() }
-.flatMap { it.split(&#34;\\W+&#34;.toRegex()) }
-.filter { it.isNotEmpty() }
-.countByValue()
-.forEach { println(it) }
-// And execute it
-kio.execute().waitUntilDone()
-</code></pre><h2 id="documentation">Documentation</h2>
-<p>For more information about Kio, please see the documentation here: <a
href="https://code.chermenin.ru/kio">https://code.chermenin.ru/kio</a>.</p>
-</div>
-<div class="clear-nav"></div></description></item></channel></rss>
\ No newline at end of file
+--></description></item></channel></rss>
\ No newline at end of file
diff --git a/website/generated-content/case-studies/linkedin/index.html
b/website/generated-content/case-studies/linkedin/index.html
index 5374a55285a..1200f62e670 100644
--- a/website/generated-content/case-studies/linkedin/index.html
+++ b/website/generated-content/case-studies/linkedin/index.html
@@ -1,4 +1,4 @@
-<!doctype html><html lang=en class=no-js><head><meta charset=utf-8><meta
http-equiv=x-ua-compatible content="IE=edge"><meta name=viewport
content="width=device-width,initial-scale=1"><title>Linkedin</title><meta
name=description content="Apache Beam is an open source, unified model and set
of language-specific SDKs for defining and executing data processing workflows,
and also data ingestion and integration flows, supporting Enterprise
Integration Patterns (EIPs) and Domain Specific Lang [...]
+<!doctype html><html lang=en class=no-js><head><meta charset=utf-8><meta
http-equiv=x-ua-compatible content="IE=edge"><meta name=viewport
content="width=device-width,initial-scale=1"><title>Revolutionizing Real-Time
Stream Processing: 4 Trillion Events Daily at LinkedIn</title><meta
name=description content="Apache Beam is an open source, unified model and set
of language-specific SDKs for defining and executing data processing workflows,
and also data ingestion and integration flows, su [...]
<script type=text/javascript
src=/js/language-switch-v2.min.121952b7980b920320ab229551857669209945e39b05ba2b433a565385ca44c6.js
defer></script>
<script type=text/javascript
src=/js/fix-menu.min.039174b67107465f2090a493f91e126f7aa797f29420f9edab8a54d9dd4b3d2d.js
defer></script>
<script type=text/javascript
src=/js/section-nav.min.1405fd5e70fab5f6c54037c269b1d137487d8f3d1b3009032525f6db3fbce991.js
defer></script>
@@ -36,7 +36,8 @@
<img class=banner-img-mobile
src=/images/banners/tour-of-beam/tour-of-beam-mobile.png alt="Start Tour of
Beam"></a></div><div class=swiper-slide><a
href=https://beam.apache.org/documentation/ml/overview/><img
class=banner-img-desktop
src=/images/banners/machine-learning/machine-learning-desktop.jpg alt="Machine
Learning">
<img class=banner-img-mobile
src=/images/banners/machine-learning/machine-learning-mobile.jpg alt="Machine
Learning"></a></div></div><div class=swiper-pagination></div></div><script
src=/js/swiper-bundle.min.min.e0e8f81b0b15728d35ff73c07f42ddbb17a108d6f23df4953cb3e60df7ade675.js></script>
<script
src=/js/sliders/top-banners.min.91104c476b3d8123ebee5ed9a8168556ec546abb698549551b38a0cee187ee1c.js></script>
-<script>function showSearch(){addPlaceholder();var
e,t=document.querySelector(".searchBar");t.classList.remove("disappear"),e=document.querySelector("#iconsBar"),e.classList.add("disappear")}function
addPlaceholder(){$("input:text").attr("placeholder","What are you looking
for?")}function endSearch(){var
e,t=document.querySelector(".searchBar");t.classList.add("disappear"),e=document.querySelector("#iconsBar"),e.classList.remove("disappear")}function
blockScroll(){$("body").toggleClass(" [...]
+<script>function showSearch(){addPlaceholder();var
e,t=document.querySelector(".searchBar");t.classList.remove("disappear"),e=document.querySelector("#iconsBar"),e.classList.add("disappear")}function
addPlaceholder(){$("input:text").attr("placeholder","What are you looking
for?")}function endSearch(){var
e,t=document.querySelector(".searchBar");t.classList.add("disappear"),e=document.querySelector("#iconsBar"),e.classList.remove("disappear")}function
blockScroll(){$("body").toggleClass(" [...]
+<button class="btn case-study-feedback-btn"
onclick='sendCaseStudyFeedback(!1,"LinkedIn")'>No</button></div></div></div><div
class=clear-nav></div></div></div></div></article></div></div><footer
class=footer><div class=footer__contained><div class=footer__cols><div
class="footer__cols__col footer__cols__col__logos"><div
class=footer__cols__col__logo><img src=/images/beam_logo_circle.svg
class=footer__logo alt="Beam logo"></div><div
class=footer__cols__col__logo><img src=/images/apache_lo [...]
<a href=https://www.apache.org>The Apache Software Foundation</a>
| <a href=/privacy_policy>Privacy Policy</a>
| <a href=/feed.xml>RSS Feed</a><br><br>Apache Beam, Apache, Beam, the Beam
logo, and the Apache feather logo are either registered trademarks or
trademarks of The Apache Software Foundation. All other products or name brands
are trademarks of their respective holders, including The Apache Software
Foundation.</div></div><div class="footer__cols__col
footer__cols__col__logos"><div class=footer__cols__col--group><div
class=footer__cols__col__logo><a href=https://github.com/apache/beam><im [...]
\ No newline at end of file
diff --git
a/website/generated-content/images/case-study/linkedin/bingfeng-xia.jpg
b/website/generated-content/images/case-study/linkedin/bingfeng-xia.jpg
new file mode 100644
index 00000000000..ca07935b689
Binary files /dev/null and
b/website/generated-content/images/case-study/linkedin/bingfeng-xia.jpg differ
diff --git a/website/generated-content/images/case-study/linkedin/scheme-1.png
b/website/generated-content/images/case-study/linkedin/scheme-1.png
new file mode 100644
index 00000000000..535f2d5f316
Binary files /dev/null and
b/website/generated-content/images/case-study/linkedin/scheme-1.png differ
diff --git a/website/generated-content/images/case-study/linkedin/scheme-2.png
b/website/generated-content/images/case-study/linkedin/scheme-2.png
new file mode 100644
index 00000000000..2ab4e428101
Binary files /dev/null and
b/website/generated-content/images/case-study/linkedin/scheme-2.png differ
diff --git a/website/generated-content/images/case-study/linkedin/scheme-3.png
b/website/generated-content/images/case-study/linkedin/scheme-3.png
new file mode 100644
index 00000000000..a7d1dd01b88
Binary files /dev/null and
b/website/generated-content/images/case-study/linkedin/scheme-3.png differ
diff --git a/website/generated-content/images/case-study/linkedin/scheme-4.png
b/website/generated-content/images/case-study/linkedin/scheme-4.png
new file mode 100644
index 00000000000..3873b3a20b1
Binary files /dev/null and
b/website/generated-content/images/case-study/linkedin/scheme-4.png differ
diff --git a/website/generated-content/images/case-study/linkedin/scheme-5.png
b/website/generated-content/images/case-study/linkedin/scheme-5.png
new file mode 100644
index 00000000000..e28537a18a8
Binary files /dev/null and
b/website/generated-content/images/case-study/linkedin/scheme-5.png differ
diff --git a/website/generated-content/images/case-study/linkedin/scheme-6.png
b/website/generated-content/images/case-study/linkedin/scheme-6.png
new file mode 100644
index 00000000000..1dadc4c9126
Binary files /dev/null and
b/website/generated-content/images/case-study/linkedin/scheme-6.png differ
diff --git a/website/generated-content/images/case-study/linkedin/xinyu-liu.jpg
b/website/generated-content/images/case-study/linkedin/xinyu-liu.jpg
new file mode 100644
index 00000000000..89813af2b09
Binary files /dev/null and
b/website/generated-content/images/case-study/linkedin/xinyu-liu.jpg differ
diff --git a/website/generated-content/index.html
b/website/generated-content/index.html
index 1a7ef494156..b814b62be15 100644
--- a/website/generated-content/index.html
+++ b/website/generated-content/index.html
@@ -35,7 +35,8 @@
<script
src=/js/sliders/top-banners.min.91104c476b3d8123ebee5ed9a8168556ec546abb698549551b38a0cee187ee1c.js></script>
<script>function showSearch(){addPlaceholder();var
e,t=document.querySelector(".searchBar");t.classList.remove("disappear"),e=document.querySelector("#iconsBar"),e.classList.add("disappear")}function
addPlaceholder(){$("input:text").attr("placeholder","What are you looking
for?")}function endSearch(){var
e,t=document.querySelector(".searchBar");t.classList.add("disappear"),e=document.querySelector("#iconsBar"),e.classList.remove("disappear")}function
blockScroll(){$("body").toggleClass(" [...]
<span>Link to GitHub Repo</span></button></a></div></div><div id=hero-mobile
class=hero-mobile><div class=hero-content><h3>Introducing Apache
Beam</h3><h1>The Unified Apache Beam Model</h1><h2>The easiest way to do batch
and streaming data processing. Write once, run anywhere data processing for
mission-critical production workloads.</h2></div></div><div class=ctas><div
class=ctas_row><a class=ctas_button href=/get-started/beam-overview/><img
src=images/info_icon.svg> Learn more</a></div [...]
-You can try the Apache Beam examples at <a
href=https://play.beam.apache.org/>Beam Playground</a>.</p><br><br><div
class=playground_or_image><a class=playground__mobile
href=https://play.beam.apache.org/><img src=images/playground.png alt="beam
playground"></a><div class=playground-wrapper><div
class=playground-snippets><div class="language-java playground-snippet"
data-sdk=java></div><div class="language-py playground-snippet"
data-sdk=python></div><div class="language-go playground-sni [...]
+You can try the Apache Beam examples at <a
href=https://play.beam.apache.org/>Beam Playground</a>.</p><br><br><div
class=playground_or_image><a class=playground__mobile
href=https://play.beam.apache.org/><img src=images/playground.png alt="beam
playground"></a><div class=playground-wrapper><div
class=playground-snippets><div class="language-java playground-snippet"
data-sdk=java></div><div class="language-py playground-snippet"
data-sdk=python></div><div class="language-go playground-sni [...]
+<img src=/images/arrow-right.svg alt="Go to the case study"></a></div><div
class=case-study-row-button-container><a
href=https://github.com/apache/beam/blob/master/website/ADD_CASE_STUDY.md
class=case-study-primary-button target=_blank rel="noopener noreferrer">Share
your story</a></div><div class=quote-img-container><div class=quote-img><img
src=images/logos/powered-by/linkedin.png alt="Quote
Logo"></div></div></div></div></div><div class=swiper-slide><div
class=wrap-slide><div class=qu [...]
<img src=/images/arrow-right.svg alt="Go to the case study"></a></div><div
class=case-study-row-button-container><a
href=https://github.com/apache/beam/blob/master/website/ADD_CASE_STUDY.md
class=case-study-primary-button target=_blank rel="noopener noreferrer">Share
your story</a></div><div class=quote-img-container><div class=quote-img><img
src=images/logos/powered-by/octo.png alt="Quote
Logo"></div></div></div></div></div><div class=swiper-slide><div
class=wrap-slide><div class=quote- [...]
<img src=/images/arrow-right.svg alt="Go to the case study"></a></div><div
class=case-study-row-button-container><a
href=https://github.com/apache/beam/blob/master/website/ADD_CASE_STUDY.md
class=case-study-primary-button target=_blank rel="noopener noreferrer">Share
your story</a></div><div class=quote-img-container><div class=quote-img><img
src=images/logos/powered-by/hsbc.png alt="Quote
Logo"></div></div></div></div></div><div class=swiper-slide><div
class=wrap-slide><div class=quote- [...]
<img src=/images/arrow-right.svg alt="Go to the case study"></a></div><div
class=case-study-row-button-container><a
href=https://github.com/apache/beam/blob/master/website/ADD_CASE_STUDY.md
class=case-study-primary-button target=_blank rel="noopener noreferrer">Share
your story</a></div><div class=quote-img-container><div class=quote-img><img
src=images/logos/powered-by/project_shield.png alt="Quote
Logo"></div></div></div></div></div><div class=swiper-slide><div
class=wrap-slide><div cl [...]
diff --git a/website/generated-content/sitemap.xml
b/website/generated-content/sitemap.xml
index 8acdd63344a..b1caf137ab6 100644
--- a/website/generated-content/sitemap.xml
+++ b/website/generated-content/sitemap.xml
@@ -1 +1 @@
-<?xml version="1.0" encoding="utf-8" standalone="yes"?><urlset
xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:xhtml="http://www.w3.org/1999/xhtml"><url><loc>/blog/beam-2.51.0/</loc><lastmod>2023-10-17T09:15:36-07:00</lastmod></url><url><loc>/categories/blog/</loc><lastmod>2023-10-17T09:15:36-07:00</lastmod></url><url><loc>/blog/</loc><lastmod>2023-10-17T09:15:36-07:00</lastmod></url><url><loc>/categories/</loc><lastmod>2023-10-17T09:15:36-07:00</lastmod></url><url><loc>/catego
[...]
\ No newline at end of file
+<?xml version="1.0" encoding="utf-8" standalone="yes"?><urlset
xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:xhtml="http://www.w3.org/1999/xhtml"><url><loc>/blog/beam-2.51.0/</loc><lastmod>2023-10-17T12:04:37-07:00</lastmod></url><url><loc>/categories/blog/</loc><lastmod>2023-10-17T12:04:37-07:00</lastmod></url><url><loc>/blog/</loc><lastmod>2023-10-17T12:04:37-07:00</lastmod></url><url><loc>/categories/</loc><lastmod>2023-10-17T12:04:37-07:00</lastmod></url><url><loc>/catego
[...]
\ No newline at end of file