This is an automated email from the ASF dual-hosted git repository.
jqin pushed a commit to branch release-1.11
in repository https://gitbox.apache.org/repos/asf/flink.git
The following commit(s) were added to refs/heads/release-1.11 by this push:
new 0ff86ee [hotfix][doc] Some minor languange clean-ups for the Source
doc.
0ff86ee is described below
commit 0ff86ee9dbc15cbebadb0a50412274254dd7f046
Author: Jiangjie (Becket) Qin <[email protected]>
AuthorDate: Wed Jun 10 12:06:07 2020 +0800
[hotfix][doc] Some minor languange clean-ups for the Source doc.
---
docs/dev/stream/sources.md | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/docs/dev/stream/sources.md b/docs/dev/stream/sources.md
index 3f20388..669ca8f 100644
--- a/docs/dev/stream/sources.md
+++ b/docs/dev/stream/sources.md
@@ -163,19 +163,19 @@ The
[SourceReader](https://github.com/apache/flink/blob/master/flink-core/src/ma
The `SourceReader` exposes a pull-based consumption interface. A Flink task
keeps calling `pollNext(ReaderOutput)` in a loop to poll records from the
`SourceReader`. The return value of the `pollNext(ReaderOutput)` method
indicates the status of the source reader.
- - `MORE_AVAILABLE` - The SourceReader has more records available
intermediately.
+ - `MORE_AVAILABLE` - The SourceReader has more records available immediately.
- `NOTHING_AVAILABLE` - The SourceReader does not have more records
available at this point, but may have more records in the future.
- `END_OF_INPUT` - The SourceReader has exhausted all the records and
reached the end of data. This means the SourceReader can be closed.
-In the interest of performance, a `ReaderOutput` is provided to the
`pollNext(ReaderOutput)` method, so a `SourceReader` can emit multiple records
in a single call of pollNext() if it has to. For example, sometimes the
external system works at the granularity of blocks. A block may contain
multiple records but the source can only checkpoint at the block boundaries. In
this case the `SourceReader` can emit one block at a time to the `ReaderOutput`.
-**However, the `SourceReader` implementation should avoid emitting multiple
records in a single `pollNext(ReaderOutput)` invocation unless necessary.**
+In the interest of performance, a `ReaderOutput` is provided to the
`pollNext(ReaderOutput)` method, so a `SourceReader` can emit multiple records
in a single call of pollNext() if it has to. For example, sometimes the
external system works at the granularity of blocks. A block may contain
multiple records but the source can only checkpoint at the block boundaries. In
this case the `SourceReader` can emit all the records in one block at a time to
the `ReaderOutput`.
+**However, the `SourceReader` implementation should avoid emitting multiple
records in a single `pollNext(ReaderOutput)` invocation unless necessary.**
This is because the task thread that is polling from the `SourceReader` works
in an event-loop and cannot block.
All the state of a `SourceReader` should be maintained inside the
`SourceSplit`s which are returned at the `snapshotState()` invocation. Doing
this allows the `SourceSplit`s to be reassigned to other `SourceReaders` when
needed.
-A `SourceReaderContext` is provided to the `Source` upon a `SourceReader`
creation. It is expected that the `Source` will pass the context to the
`SourceReader` instance. The `SourceReader` can send `SourceEvent` to its
`SplitEnumerator`. A typical design pattern for the `Source` is letting the
`SourceReader`s report their local information to the `SplitEnumerator` who has
a global view to make decisions.
+A `SourceReaderContext` is provided to the `Source` upon a `SourceReader`
creation. It is expected that the `Source` will pass the context to the
`SourceReader` instance. The `SourceReader` can send `SourceEvent` to its
`SplitEnumerator` through the `SourceReaderContext`. A typical design pattern
of the `Source` is letting the `SourceReader`s report their local information
to the `SplitEnumerator` who has a global view to make decisions.
-The `SourceReader` API is a low level API that allows users to deal with the
splits manually and have their own threading model to fetch and handover the
records. To facilitate the `SourceReader` implementation, Flink has provided a
[SourceReaderBase](https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-base/src/main/java/org/apache/flink/connector/base/source/reader/SourceReaderBase.java)
class which significantly reduce the amount the work needed to write a `Sou
[...]
-**It is highly recommended for the connector developers to take advantage of
the `SourceReaderBase` instead of writing the `SourceReader`s by themselves
from scratch**. For more details please check the [Split Reader
API](#the-split-reader-api) section.
+The `SourceReader` API is a low level API that allows users to deal with the
splits manually and have their own threading model to fetch and handover the
records. To facilitate the `SourceReader` implementation, Flink has provided a
[SourceReaderBase](https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-base/src/main/java/org/apache/flink/connector/base/source/reader/SourceReaderBase.java)
class which significantly reduces the amount the work needed to write a `So
[...]
+**It is highly recommended for the connector developers to take advantage of
the `SourceReaderBase` instead of writing the `SourceReader`s from scratch**.
For more details please check the [Split Reader API](#the-split-reader-api)
section.
### Use the Source
In order to create a `DataStream` from a `Source`, one needs to pass the
`Source` to a `StreamExecutionEnvironment`. For example,
@@ -236,9 +236,9 @@ Please check the Java doc of the class for more details.
It is quite common that a `SourceReader` implementation does the following:
- Have a pool of threads fetching from splits of the external system in a
blocking way.
- - Handle the synchronization between the fetching threads mentioned above
and the `pollNext(ReaderOutput)` invocation.
+ - Handle the synchronization between the internal fetching threads and other
methods invocations such as `pollNext(ReaderOutput)`.
- Maintain the per split watermark for watermark alignment.
- - Maintain the state of each split for checkpoint
+ - Maintain the state of each split for checkpoint.
In order to reduce the work of writing a new `SourceReader`, Flink provides a
[SourceReaderBase](https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-base/src/main/java/org/apache/flink/connector/base/source/reader/SourceReaderBase.java)
class to serve as a base implementation of the `SourceReader`.
`SourceReaderBase` has all the above work done out of the box. To write a new
`SourceReader`, one can just let the `SourceReader` implementation inherit from
the `SourceReaderBase`, fill in a few methods and implement a high level
[SplitReader](https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-base/src/main/java/org/apache/flink/connector/base/source/reader/splitreader/SplitReader.java).