This is an automated email from the ASF dual-hosted git repository.
zkaoudi pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/incubator-wayang-website.git
The following commit(s) were added to refs/heads/main by this push:
new 49b6d2de updating about
49b6d2de is described below
commit 49b6d2deef2a5a1a96b2080ecd91c861491f4dd2
Author: Zoi <[email protected]>
AuthorDate: Wed Jan 31 19:20:43 2024 +0100
updating about
---
docs/introduction/about.md | 39 +++++++++++++++++++++++++++-----
static/img/architecture/wayang-plan.png | Bin 579427 -> 1006279 bytes
2 files changed, 33 insertions(+), 6 deletions(-)
diff --git a/docs/introduction/about.md b/docs/introduction/about.md
index b5683b43..4a158b87 100644
--- a/docs/introduction/about.md
+++ b/docs/introduction/about.md
@@ -25,11 +25,38 @@ sidebar_position: 1
Apache Wayang's three-layer architecture provides a strategic abstraction
between user applications and underlying data processing platforms, ensuring
seamless integration and optimization. The application layer encapsulates
application-specific logic, while the core layer acts as an intermediary,
translating application logic into a standardized intermediate representation
(WayangPlan). This standardized representation is then passed to the platform
layer, where it is optimized for exec [...]
-### Architecture
-Apache Wayang's unique architecture, unlike traditional DBMSs, decouples the
physical planning and execution layers, empowering developers to express their
data processing logic in a platform-agnostic fashion. This separation of
concerns allows developers to focus on the algorithmic aspects of their
applications without being constrained by the intricacies of specific
processing platforms.<br /><br />
-<img width="75%" alt="wayang architecture"
src="/img/architecture/wayang-schema.png" />
-<br /><br />
+### Architecture and Software stack
+Apache Wayang's unique architecture, unlike traditional DBMSs, decouples the
physical planning and execution layers, empowering developers to express their
data processing logic in a platform-agnostic fashion. This separation of
concerns allows developers to focus on the algorithmic aspects of their
applications without being constrained by the intricacies of specific
processing platforms.
+
+<br/>
+<img width="75%" alt="wayang stack" src="/img/architecture/wayang-stack.png"
/>
+<br />
+
+At the bottom layers of the software stack, there are the different data
storage mediums and the supported data processing platforms. On top of these,
Wayang’s core consists of the following main components: the optimizer, the
executor, the monitor, and platform-specific drivers. Wayang currently supports
two main APIs: the Java one and the Scala one. A Python API is currently under
development. Besides using any of the supported languages, users can directly
input SQL queries via the SQ [...]
+
+<br/>
+
Apache Wayang's core strength lies in its cross-platform task execution,
enabling developers to seamlessly leverage the strengths of various processing
engines, such as Hadoop, Spark, and Flink, without sacrificing performance or
flexibility. The platform's ease of use further enhances its appeal, making it
a compelling choice for data engineers and developers seeking a unified and
versatile data processing solution.
+<br/>
+Below you can see on the left, a Wayang plan representing the stochastic
gradient descent algorithm, which used in most deep learning tasks. On the
right, you can see how the optimizer decided to execute it. Orange nodes are
the operators that ran on Spark and green the operators that executed as a
single Java process.
+<br/>
+<img width="75%" alt="wayang plan" src="/img/architecture/wayang-plan.png" />
+<br/>
+
+### Query Optimizer
+Wayang's query optimizer is the principal component of the core. It receives
as input a Wayang plan and outputs an execution (platform-specific) plan (see
example plan above) with the goal of minimizing the total cost. The metric for
the cost can be anything, from execution runtime to monetary cost or energy
consumption.
+To achieve this, the optimizer first “inflates” the plan: For each node that
corresponds to a Wayang operator, it adds all the corresponding execution
(platform-specific) operators. Once the inflated plan is created, the optimizer
attaches not only the operator’s costs but also the costs for moving
intermediate data from one platform to another. By default, Wayang uses linear
cost formulas to estimate these costs but one can plug their own optimizer,
e.g., an ML-based one. At the last st [...]
<br />
-<img width="75%" alt="wayang architecture"
src="/img/architecture/wayang-plan.png" />
-<br />
\ No newline at end of file
+<img width="75%" alt="wayang optimizer"
src="/img/architecture/wayang-optimizer.png" />
+<br />
+Note that users can control the optimizer by specifying in their code where an
operator has to be executed via the `withTargetPlatform(plat)` call on the
desired operator. Then, the optimizer takes into consideration the decisions of
the user and outputs an execution plan by navigating a reduced search space
during the plan enumeration.
+<br />
+
+### History
+Wayang (formely called Rheem) is the product of many years of top quality
research. Below you can find the main publications:
+
+* Apache Wayang: A Unified Data Analytics Framework. SIGMOD Rec. 52(3): 30-35
(2023)
[pdf](https://sigmodrecord.org/publications/sigmodRecord/2309/pdfs/05_Systems_Beedkar.pdf)
+
+* RHEEMix in the data jungle: a cost-based optimizer for cross-platform
systems. VLDB J. 29(6): 1287-1310 (2020)
[pdf](https://link.springer.com/article/10.1007/s00778-020-00612-x)
+
+* RHEEM: Enabling Cross-Platform Data Processing - May The Big Data Be With
You! -. Proc. VLDB Endow. 11(11): 1414-1427 (2018)
[pdf](http://www.vldb.org/pvldb/vol11/p1414-agrawal.pdf)
\ No newline at end of file
diff --git a/static/img/architecture/wayang-plan.png
b/static/img/architecture/wayang-plan.png
index 1af187af..6f31ec3d 100644
Binary files a/static/img/architecture/wayang-plan.png and
b/static/img/architecture/wayang-plan.png differ