(incubator-wayang-website) branch main updated: Delete blog/kafka-meets-wayang-2.md

aloalt Sun, 10 Mar 2024 03:05:12 -0700

This is an automated email from the ASF dual-hosted git repository.

aloalt pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/incubator-wayang-website.git



The following commit(s) were added to refs/heads/main by this push:
     new 4cc41479 Delete blog/kafka-meets-wayang-2.md
4cc41479 is described below

commit 4cc41479b273ce118542837d124eaad572f6e50f
Author: Alexander Alten <[email protected]>
AuthorDate: Sun Mar 10 11:05:04 2024 +0100

    Delete blog/kafka-meets-wayang-2.md
    
    double
---
 blog/kafka-meets-wayang-2.md | 95 --------------------------------------------
 1 file changed, 95 deletions(-)

diff --git a/blog/kafka-meets-wayang-2.md b/blog/kafka-meets-wayang-2.md
deleted file mode 100644
index bfd2abbd..00000000
--- a/blog/kafka-meets-wayang-2.md
+++ /dev/null
@@ -1,95 +0,0 @@
----
-slug: Apache Kafka meets Apache Wayang
-title: Apache Kafka meets Apache Wayang - Part 2
-authors: kamir
-tags: [wayang, kafka, cross organization data collaboration]
----
-
-# Apache Wayang meets Apache Kafka - Part 2
-
-In the second part of the article series we describe the implementation of the 
Kafka Source and Kafka Sink component for Apache Wayang.
-We look into the “Read- and Write-Path” for our data items, called 
_DataQuanta_.
-
-## Apache Wayang’s Read & Write Path for Kafka topics
-
-To describe the read and write paths for data in the context of the created 
Apache Wayang code snippet, the primary classes and interfaces we need to 
understand are as follows:
-
-**WayangContext:** This class is essential for initializing the Wayang 
processing environment. 
-It allows you to configure the execution environment and register plugins that 
define which platforms Wayang can use for data processing tasks, such as 
_Java.basicPlugin()_ for local Java execution.
-
-**JavaPlanBuilder:** This class is used to build and define the data 
processing pipeline (or plan) in Wayang. 
-It provides a fluent API to specify the operations to be performed on the 
data, from reading the input to processing it and writing the output.
-
-### Read Path
-The read path describes how data is ingested from a source into the Wayang 
processing pipeline:
-
-_Reading from Kafka Topic:_ The method _readKafkaTopic(topicName)_ is used to 
ingest data from a specified Kafka topic. 
-This is the starting point of the data processing pipeline, where topicName 
represents the name of the Kafka topic from which data is read.
-
-_Data Tokenization and Preparation:_ Once the data is read from Kafka, it 
undergoes several transformations such as Splitting, Filtering, and Mapping. 
-What follows are the procedures known as Reducing, Grouping, Co-Grouping, and 
Counting.
-
-### Write Path
-_Writing to Kafka Topic:_ The final step in the pipeline involves writing the 
processed data back to a Kafka topic using _.writeKafkaTopic(...)_. 
-This method takes parameters that specify the target Kafka topic, a 
serialization function to format the data as strings, and additional 
configuration for load profile estimation, which optimizes the writing process.
-
-This read-write path provides a comprehensive flow of data from ingestion from 
Kafka, through various processing steps, and finally back to Kafka, showcasing 
a full cycle of data processing within Apache Wayang's abstracted environment 
and is implemented in our example program shown in *listing 1*.
-
-## Implementation of Input- and Output Operators
-The next section shows how a new pair of operators can be implemented to 
extend Apache Wayang’s capabilities on the input and output side. 
-We created the Kafka Source and Kafka Sink components so that our cross 
organizational data collaboration scenario can be implemented using data 
streaming infrastructure.
-
-**Level 1 – Wayang execution plan with abstract operators**
-
-The implementation of our Kafka Source and Kafka Sink components for Apache 
Wayang requires new methods and classes on three layers. 
-First of all in the API package. 
-Here we use the JavaPlanBuilder to expose the function for selecting a Kafka 
topic as the source to be used by client.  
-The class _JavaPlanBuilder_ in package _org.apache.wayang.api_ in the project 
*wayang-api/wayang-api-scala-java* exposes our new functionality to our 
external client.
-An instance of the JavaPlanBuilder is used to define the data processing 
pipeline. 
-We use its _readKafkaTopic()_ which specifies the source Kafka topic to read 
from, and for the write path we use the _writeKafkaTopic()_ method. 
-Both Methods do only trigger activities in the background.
-
-For the output side, we use the _DataQuantaBuilder_ class, which offers an 
implementation of the writeKafkaTopic function. 
-This function is designed to send processed data, referred to as DataQuanta, 
to a specified Kafka topic. 
-Essentially, it marks the final step in a data processing sequence constructed 
using the Apache Wayang framework.
-
-In the DataQuanta class we implemented the methods writeKafkaTopic and 
writeKafkaTopicJava which use the KafkaTopicSink class. 
-In this API layer we use the Scala programming language, but we utilize the 
Java classes, implemented in the layer below.
-
-**Level 2 – Wiring between Platform Abstraction and Implementation**
-
-The second layer builds the bridge between the WayangContext and PlanBuilders 
which work together with DataQuanta and the DataQuantaBuilder.
-
-Also, the mapping between the abstract components and the specific 
implementations are defined in this layer.
-
-Therefore, the mappings package has a class _Mappings_ in which all relevant 
input and output operators are listed. 
-We use it to register the KafkaSourceMapping and a KafkaSinkMapping for the 
particular platform, Java in our case. 
-These classes allow the Apache Wayang framework to use the Java implementation 
of the KafkaTopicSource component (and KafkaTopicSink respectively). 
-While the Wayang execution plan uses the higher abstractions, here on the 
“platform level” we have to link the specific implementation for the target 
platform. 
-In our case this leads to a Java program running on a JVM which is set up by 
the Apache Wayang framework using the logical components of the execution plan.
-
-Those mappings link the real implementation of our operators the ones used in 
an execution plan.
-The JavaKafkaTopicSource and the JavaKafkaTopicSink extend the 
KafkaTopicSource and KafkaTopicSink so that the lower level implementation of 
those classes become available within Wayang’s Java Platform context.
-
-In this layer, the KafkaConsumer class and the KafkaProducer class are used, 
but both are configured and instantiated in the next layer underneath. 
-All this is done in the project *wayang-plarforms/wayang-java*.
-
-**Layer 3 – Input/Output Connector Layer**
-
-The _KafkaTopicSource_ and _KafkaTopicSink_ classes build the third layer of 
our implementation. 
-Both are implemented in Java programming language. 
-In this layer, the real Kafka-Client logic is defined. 
-Details about consumer and producers, client configuration, and schema 
handling have to be handled here.
-
-## Summary
-Both classes in the third layer implement the Kafka client logic which is 
needed by the Wayang-execution plan when external data flows should be 
established. 
-The layer above handles the mapping of the components at startup time. 
-All this wiring is needed to keep Wayang open and flexible so that multiple 
external systems can be used in a variety of combinations and using multiple 
target platforms in combinations.
-
-## Outlook
-The next part of the article series will cover the creation of an Kafka Source 
and Sink component for the Apache Spark platform, which allows our use case to 
scale. 
-Finally, in part four we bring all puzzles together, and show the full 
implementation of the multi organizational data collaboration use case.
-
-
-
-

(incubator-wayang-website) branch main updated: Delete blog/kafka-meets-wayang-2.md

Reply via email to