echauchot commented on code in PR #641:
URL: https://github.com/apache/flink-web/pull/641#discussion_r1171417949


##########
docs/content/posts/2023-04-13-howto-create-batch-source.md:
##########
@@ -0,0 +1,221 @@
+---
+title:  "Howto create a batch source with the new Source framework"
+date: "2023-04-13T08:00:00.000Z"
+authors:
+- echauchot:
+  name: "Etienne Chauchot"
+  twitter: "echauchot"
+aliases:
+- /2023/04/13/howto-create-batch-source.html
+---
+
+## Introduction
+
+Flink community has
+designed [a new Source 
framework](https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/dev/datastream/sources/)
+based
+on 
[FLIP-27](https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface)
+lately. Some connectors have migrated to this new framework. This article is 
an howto create a batch
+source using this new framework. It was built while implementing
+the [Flink batch 
source](https://github.com/apache/flink-connector-cassandra/commit/72e3bef1fb9ee6042955b5e9871a9f70a8837cca)
+for [Cassandra](https://cassandra.apache.org/_/index.html). I felt it could be 
useful to people
+interested in contributing or migrating connectors.
+
+## Implementing the source components
+
+The aim here is not to duplicate the official documentation. For details you 
should read the
+documentation, the javadocs or the Cassandra connector code. The links are 
above. The goal here is
+to give field feedback on how to implement the different components.
+
+The source architecture is depicted in the diagrams below:
+
+![](/img/blog/2023-04-13-howto-create-batch-source/source_components.svg)
+
+![](/img/blog/2023-04-13-howto-create-batch-source/source_reader.svg)
+
+### Source
+[example Cassandra 
Source](https://github.com/apache/flink-connector-cassandra/blob/d92dc8d891098a9ca6a7de6062b4630079beaaef/flink-connector-cassandra/src/main/java/org/apache/flink/connector/cassandra/source/CassandraSource.java)
+
+The source interface only does the "glue" between all the other components. 
Its role is to
+instantiate all of them and to define the source 
[Boundedness](https://nightlies.apache.org/flink/flink-docs-master/api/java/org/apache/flink/api/connector/source/Boundedness.html).
 We also do the source configuration
+here along with user configuration validation.
+
+### SourceReader
+[example Cassandra 
SourceReader](https://github.com/apache/flink-connector-cassandra/blob/d92dc8d891098a9ca6a7de6062b4630079beaaef/flink-connector-cassandra/src/main/java/org/apache/flink/connector/cassandra/source/reader/CassandraSourceReader.java)
+
+As shown in the graphic above, the instances of the 
[SourceReader](https://nightlies.apache.org/flink/flink-docs-master/api/java/org/apache/flink/api/connector/source/SourceReader.html)
 (which we will call simply readers
+in the continuation of this article) run in parallel in task managers to read 
the actual data which
+is divided into [Splits](#split-and-splitstate). Readers request splits from 
the [SplitEnumerator](#splitenumerator-and-splitenumeratorstate) and the 
resulting splits are
+assigned to them in return.
+
+Luckily for us Flink provides the 
[SourceReaderBase](https://nightlies.apache.org/flink/flink-docs-master/api/java/org/apache/flink/connector/base/source/reader/SourceReaderBase.html)
 implementation that takes care of the
+synchronization between the readers and the main thread. Flink also provides a 
useful extension to

Review Comment:
   I was referring to the mailbox main thread but I did not want to give too 
much detail to the reader. Especially avoid describing Flink internals because 
it is off topic of the current article.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to