This is an automated email from the ASF dual-hosted git repository.
eskabetxe pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/bahir-website.git
The following commit(s) were added to refs/heads/master by this push:
new 62bce83 [BAHIR-293] Fix flink documentation
62bce83 is described below
commit 62bce83563e9d14722ba533e33aedc0b543f3e11
Author: Joao Boto <[email protected]>
AuthorDate: Tue Dec 7 11:33:58 2021 +0100
[BAHIR-293] Fix flink documentation
---
site/Gemfile | 3 +++
site/docs/flink/current/documentation.md | 4 ++++
site/docs/flink/current/flink-streaming-kudu.md | 25 ++++++++++++------------
site/docs/flink/current/flink-streaming-pinot.md | 9 ++++++++-
4 files changed, 28 insertions(+), 13 deletions(-)
diff --git a/site/Gemfile b/site/Gemfile
index c13c470..f49e038 100644
--- a/site/Gemfile
+++ b/site/Gemfile
@@ -14,6 +14,9 @@
# limitations under the License.
#
source 'https://rubygems.org'
+ruby "2.7.0"
+gem 'jekyll', '= 3.9.0'
+
gem 'github-pages'
gem 'rouge'
gem 'jekyll-oembed', :require => 'jekyll_oembed'
diff --git a/site/docs/flink/current/documentation.md
b/site/docs/flink/current/documentation.md
index 1af2a3c..969c97f 100644
--- a/site/docs/flink/current/documentation.md
+++ b/site/docs/flink/current/documentation.md
@@ -39,8 +39,12 @@ limitations under the License.
[InfluxDB connector](../flink-streaming-influxdb)
+[InfluxDB2 connector](../flink-streaming-influxdb2)
{:height="36px" width="36px"}
+
[Kudu connector](../flink-streaming-kudu)
[Netty connector](../flink-streaming-netty)
+[Pinot connector](../flink-streaming-pinot)
{:height="36px" width="36px"}
+
[Redis connector](../flink-streaming-redis)
diff --git a/site/docs/flink/current/flink-streaming-kudu.md
b/site/docs/flink/current/flink-streaming-kudu.md
index 2af5e9a..28dec2c 100644
--- a/site/docs/flink/current/flink-streaming-kudu.md
+++ b/site/docs/flink/current/flink-streaming-kudu.md
@@ -181,18 +181,19 @@ The example uses lambda expressions to implement the
functional interfaces.
Read more about Kudu schema design in the [Kudu
docs](https://kudu.apache.org/docs/schema_design.html).
### Supported data types
-| Flink/SQL | Kudu |
-| ------------- |:-------------:|
-| STRING | STRING |
-| BOOLEAN | BOOL |
-| TINYINT | INT8 |
-| SMALLINT | INT16 |
-| INT | INT32 |
-| BIGINT | INT64 |
-| FLOAT | FLOAT |
-| DOUBLE | DOUBLE |
-| BYTES | BINARY |
-| TIMESTAMP(3) | UNIXTIME_MICROS |
+
+| Flink/SQL | Kudu |
+|----------------------|:-----------------------:|
+| `STRING` | STRING |
+| `BOOLEAN` | BOOL |
+| `TINYINT` | INT8 |
+| `SMALLINT` | INT16 |
+| `INT` | INT32 |
+| `BIGINT` | INT64 |
+| `FLOAT` | FLOAT |
+| `DOUBLE` | DOUBLE |
+| `BYTES` | BINARY |
+| `TIMESTAMP(3)` | UNIXTIME_MICROS |
Note:
* `TIMESTAMP`s are fixed to a precision of 3, and the corresponding Java
conversion class is `java.sql.Timestamp`
diff --git a/site/docs/flink/current/flink-streaming-pinot.md
b/site/docs/flink/current/flink-streaming-pinot.md
index b5a705a..b4c9a7b 100644
--- a/site/docs/flink/current/flink-streaming-pinot.md
+++ b/site/docs/flink/current/flink-streaming-pinot.md
@@ -44,12 +44,14 @@ See how to link with them for cluster execution
[here](https://ci.apache.org/pro
The sink class is called `PinotSink`.
## Architecture
+
The Pinot sink stores elements from upstream Flink tasks in an Apache Pinot
table.
We support two execution modes
* `RuntimeExecutionMode.BATCH`
* `RuntimeExecutionMode.STREAMING` which requires checkpointing to be enabled.
### PinotSinkWriter
+
Whenever the sink receives elements from upstream tasks, they are received by
an instance of the PinotSinkWriter.
The `PinotSinkWriter` holds a list of `PinotWriterSegment`s where each
`PinotWriterSegment` is capable of storing `maxRowsPerSegment` elements.
Whenever the maximum number of elements to hold is not yet reached the
`PinotWriterSegment` is considered to be active.
@@ -60,7 +62,8 @@ Once the maximum number of elements to hold was reached, an
active `PinotWriterS
Thus, there is always one active `PinotWriterSegment` that new incoming
elements will go to.
Over time, the list of `PinotWriterSegment` per `PinotSinkWriter` increases up
to the point where a checkpoint is created.
-**Checkpointing**
+**Checkpointing**
+
On checkpoint creation `PinotSinkWriter.prepareCommit` gets called by the
Flink environment.
This triggers the creation of `PinotSinkCommittable`s where each inactive
`PinotWriterSegment` creates exactly one `PinotSinkCommittable`.
@@ -72,6 +75,7 @@ A `PinotSinkCommittables` then holds the path to the data
file on the shared fil
### PinotSinkGlobalCommitter
+
In order to be able to follow the guidelines for Pinot segment naming, we need
to include the minimum and maximum timestamp of an element in the metadata of a
segment and in its name.
The minimum and maximum timestamp of all elements between two checkpoints is
determined at a parallelism of 1 in the `PinotSinkGlobalCommitter`.
This procedure allows recovering from failure by deleting previously uploaded
segments which prevents having duplicate segments in the Pinot table.
@@ -90,10 +94,12 @@ When finally committing a `PinotSinkGlobalCommittable` the
following procedure i
## Delivery Guarantees
+
Resulting from the above described architecture the `PinotSink` provides an
at-least-once delivery guarantee.
While the failure recovery mechanism ensures that duplicate segments are
prevented, there might be temporary inconsistencies in the Pinot table which
can result in downstream tasks receiving an element multiple times.
## Options
+
| Option | Description
|
| ---------------------- |
--------------------------------------------------------------------------------
|
| `pinotControllerHost` | Host of the Pinot controller
|
@@ -108,6 +114,7 @@ While the failure recovery mechanism ensures that duplicate
segments are prevent
| `numCommitThreads` | Number of threads used in the
`PinotSinkGlobalCommitter` for committing segments |
## Usage
+
```java
StreamExecutionEnvironment env = ...
// Checkpointing needs to be enabled when executing in STREAMING mode