This is an automated email from the ASF dual-hosted git repository.
liuyu pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/pulsar.git
The following commit(s) were added to refs/heads/master by this push:
new 5da9be9f558 [improve][doc] Functions doc improvements Phase2 (#16172)
5da9be9f558 is described below
commit 5da9be9f5582256eea4e5b62d476c5f038eea0bf
Author: momo-jun <[email protected]>
AuthorDate: Fri Aug 19 11:16:20 2022 +0800
[improve][doc] Functions doc improvements Phase2 (#16172)
---
.../function-count-based-tumbling-window.png | Bin 156363 -> 0 bytes
.../function-count-based-tumbling-window.svg | 1 +
site2/docs/assets/function-data-window.png | Bin 334827 -> 0 bytes
site2/docs/assets/function-data-window.svg | 1 +
site2/docs/assets/function-sliding-window.png | Bin 150032 -> 0 bytes
site2/docs/assets/function-sliding-window.svg | 1 +
.../assets/function-time-based-tumbling-window.png | Bin 150774 -> 0 bytes
.../assets/function-time-based-tumbling-window.svg | 1 +
site2/docs/functions-cli.md | 2 +-
site2/docs/functions-concepts.md | 21 ++++----
site2/docs/functions-overview.md | 54 ++++++++++-----------
site2/docs/functions-runtime-process.md | 2 +
site2/docs/functions-runtime-thread.md | 8 ++-
site2/docs/io-overview.md | 2 +-
14 files changed, 50 insertions(+), 43 deletions(-)
diff --git a/site2/docs/assets/function-count-based-tumbling-window.png
b/site2/docs/assets/function-count-based-tumbling-window.png
deleted file mode 100644
index c5ad6b613c5..00000000000
Binary files a/site2/docs/assets/function-count-based-tumbling-window.png and
/dev/null differ
diff --git a/site2/docs/assets/function-count-based-tumbling-window.svg
b/site2/docs/assets/function-count-based-tumbling-window.svg
new file mode 100644
index 00000000000..1d5d104d5dd
--- /dev/null
+++ b/site2/docs/assets/function-count-based-tumbling-window.svg
@@ -0,0 +1 @@
+<svg xmlns="http://www.w3.org/2000/svg"
xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:lucid="lucid" width="909.87"
height="343.91"><g transform="translate(-17.333333333294433
-456.5918872073414)" lucid:page-tab-id="0_0"><path d="M0 0h1870.87v1322.83H0z"
fill="#fff"/><path d="M195.33 569.26a6 6 0 0 1 6-6h102.25a6 6 0 0 1 6 6v92.17a6
6 0 0 1-6 6H201.33a6 6 0 0 1-6-6z" fill="#fff"/><path d="M199.04
563.74l.44-.18.9-.22h1.03m4.05 0h2.03m4.04 0h2.03m4.04-.02h2.02m4.04
0h2.03m4.03 0h2.03m4. [...]
\ No newline at end of file
diff --git a/site2/docs/assets/function-data-window.png
b/site2/docs/assets/function-data-window.png
deleted file mode 100644
index 45c3cfca894..00000000000
Binary files a/site2/docs/assets/function-data-window.png and /dev/null differ
diff --git a/site2/docs/assets/function-data-window.svg
b/site2/docs/assets/function-data-window.svg
new file mode 100644
index 00000000000..ed2c5ac53eb
--- /dev/null
+++ b/site2/docs/assets/function-data-window.svg
@@ -0,0 +1 @@
+<svg xmlns="http://www.w3.org/2000/svg"
xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:lucid="lucid" width="1196.89"
height="438.72"><g transform="translate(-61.99999999996112 -362.8796296530241)"
lucid:page-tab-id="0_0"><path d="M0 0h1870.87v1322.83H0z" fill="#fff"/><path
d="M254 505.43a6 6 0 0 1 6-6h844.98a6 6 0 0 1 6 6v156a6 6 0 0 1-6 6H260a6 6 0 0
1-6-6z" fill="#fff"/><path d="M255 505.43c0 .56-.45 1-1 1s-1-.44-1-1c0-.55.45-1
1-1s1 .45 1 1zm1.45-3.88c0 .55-.45 1-1 1s-1-.45-1-1 .45- [...]
\ No newline at end of file
diff --git a/site2/docs/assets/function-sliding-window.png
b/site2/docs/assets/function-sliding-window.png
deleted file mode 100644
index bf66a761819..00000000000
Binary files a/site2/docs/assets/function-sliding-window.png and /dev/null
differ
diff --git a/site2/docs/assets/function-sliding-window.svg
b/site2/docs/assets/function-sliding-window.svg
new file mode 100644
index 00000000000..efdac727165
--- /dev/null
+++ b/site2/docs/assets/function-sliding-window.svg
@@ -0,0 +1 @@
+<svg xmlns="http://www.w3.org/2000/svg"
xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:lucid="lucid" width="998.67"
height="397.78"><g transform="translate(-566.6666666666472
-336.30936180364023)" lucid:page-tab-id="0_0"><path d="M0 0h1870.87v1322.83H0z"
fill="#fff"/><path d="M751.92 517.84h40v60h-40z" stroke="#474e55"
stroke-width="2" fill="#188fff"/><g fill="none"><path d="M1196.67
363.4v79.68h-458V363.4z"/><path d="M738.67 443.08c0-15.8 12.8-28.62
28.62-28.62h171.74c15.8 0 28.63-12. [...]
\ No newline at end of file
diff --git a/site2/docs/assets/function-time-based-tumbling-window.png
b/site2/docs/assets/function-time-based-tumbling-window.png
deleted file mode 100644
index 610347e2e5b..00000000000
Binary files a/site2/docs/assets/function-time-based-tumbling-window.png and
/dev/null differ
diff --git a/site2/docs/assets/function-time-based-tumbling-window.svg
b/site2/docs/assets/function-time-based-tumbling-window.svg
new file mode 100644
index 00000000000..4ef6660fdc8
--- /dev/null
+++ b/site2/docs/assets/function-time-based-tumbling-window.svg
@@ -0,0 +1 @@
+<svg xmlns="http://www.w3.org/2000/svg"
xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:lucid="lucid" width="1196.89"
height="364.2"><g transform="translate(-61.99999999996112 -437.3961096847811)"
lucid:page-tab-id="0_0"><path d="M0 0h1870.87v1322.83H0z" fill="#fff"/><path
d="M234 569.26a6 6 0 0 1 6-6h435.92a6 6 0 0 1 6 6v92.17a6 6 0 0 1-6 6H240a6 6 0
0 1-6-6z" fill="#fff"/><path d="M237.7 563.74l.45-.18.9-.22h1m4 0h2m3.98 0h2m4
0H258m4 0h2m4 0h1.98m4 0h1.98m4 0h2m3.98 0h2m3.98 0h2m4 0h [...]
\ No newline at end of file
diff --git a/site2/docs/functions-cli.md b/site2/docs/functions-cli.md
index c2c4f3ee9fa..42bd38dc310 100644
--- a/site2/docs/functions-cli.md
+++ b/site2/docs/functions-cli.md
@@ -46,7 +46,7 @@ You can configure a function by using a predefined YAML file.
The following tabl
| userConfig | Map`<String,Object>` | `--user-config`
| User-defined config key/values. |
| secrets | Map`<String,Object>` | `--secrets` | The mapping from
secretName to objects that encapsulate how the secret is fetched by the
underlying secrets provider. |
| runtime | String | N/A | The runtime of a
function. Available values: `java`,`python`, `go`. |
-| autoAck | Boolean | `--auto-ack` | Whether the framework
acknowledges messages automatically or not. <br /><br />**Tip**: This
configuration will be deprecated. If the user specifies delivery semantics, the
framework will automatically ack messages. If you do not want the framework to
ack messages, set the **processingGuarantees** to `MANUAL`. |
+| autoAck | Boolean | `--auto-ack` | Whether the framework
acknowledges messages automatically or not. <br /><br />**Note**: This
configuration will be deprecated in future releases. If you specify a delivery
semantic, the framework automatically acknowledges messages. If you do not want
the framework to auto-ack messages, set the `processingGuarantees` to `MANUAL`.
|
| maxMessageRetries | Int | `--max-message-retries` | The number of
retries to process a message before giving up. |
| deadLetterTopic | String | `--dead-letter-topic` | The topic used
for storing messages that are not processed successfully. |
| subName | String | `--subs-name` | The name of
Pulsar source subscription used for input-topic consumers if required.|
diff --git a/site2/docs/functions-concepts.md b/site2/docs/functions-concepts.md
index cfbba46aaf2..4f40f8ec7be 100644
--- a/site2/docs/functions-concepts.md
+++ b/site2/docs/functions-concepts.md
@@ -75,10 +75,10 @@ Pulsar provides three different messaging delivery
semantics that you can apply
| Delivery semantics | Description | Adopted subscription type |
|--------------------|-------------|---------------------------|
-| **At-most-once** delivery | Each message sent to a function is processed at
its best effort. There’s no guarantee that the message will be processed or
not. <br /><br /> When setting At-most-once, the `autoAck` configuration must
be equal to true, otherwise the startup will fail(`autoAck` configuration will
be deprecated in future releases). <br/><br/> **Ack time node**: Before
function processing. | Shared |
-| **At-least-once** delivery (default) | Each message sent to the function can
be processed more than once (in case of a processing failure or redelivery).<br
/><br />If you create a function without specifying the
`--processing-guarantees` flag, the function provides `at-least-once` delivery
guarantee. <br/><br/> **Ack time node**: After sending a message to output. |
Shared |
-| **Effectively-once** delivery | Each message sent to the function can be
processed more than once but it has only one output. Duplicated messages are
ignored.<br /><br />`Effectively once` is achieved on top of `at-least-once`
processing and guaranteed server-side deduplication. This means a state update
can happen twice, but the same state update is only applied once, the other
duplicated state update is discarded on the server-side. <br/><br/> **Ack time
node**: After sending a messa [...]
-| **Manual** delivery | Under this semantics, the user needs to call the
method `context.getCurrentRecord().ack()` inside the function to manually
perform the ack operation, and the framework will not help users to do any ack
operations. <br/><br/> **Ack time node**: User decides, in function method. |
Shared |
+| **At-most-once** delivery | Each message sent to a function is processed at
its best effort. There’s no guarantee that the message will be processed or
not. <br /><br /> When you select this semantic, the `autoAck` configuration
must be set to `true`, otherwise the startup will fail (the `autoAck`
configuration will be deprecated in future releases). <br /><br /> **Ack time
node**: Before function processing. | Shared |
+| **At-least-once** delivery (default) | Each message sent to a function can
be processed more than once (in case of a processing failure or redelivery).<br
/><br />If you create a function without specifying the
`--processing-guarantees` flag, the function provides `at-least-once` delivery
guarantee. <br /><br /> **Ack time node**: After sending a message to output. |
Shared |
+| **Effectively-once** delivery | Each message sent to a function can be
processed more than once but it has only one output. Duplicated messages are
ignored.<br /><br />`Effectively once` is achieved on top of `at-least-once`
processing and guaranteed server-side deduplication. This means a state update
can happen twice, but the same state update is only applied once, the other
duplicated state update is discarded on the server-side. <br /><br /> **Ack
time node**: After sending a messa [...]
+| **Manual** delivery | When you select this semantic, the framework does not
perform any ack operations, and you need to call the method
`context.getCurrentRecord().ack()` inside a function to manually perform the
ack operation. <br /><br /> **Ack time node**: User-defined within function
methods. | Shared |
:::tip
@@ -149,13 +149,13 @@ Pulsar Functions take byte arrays as inputs and spit out
byte arrays as output.
:::note
-Currently, window function is only available in Java.
+Currently, window function is only available in Java, and does not support
`MANUAL` and `Effectively-once` delivery semantics.
:::
Window function is a function that performs computation across a data window,
that is, a finite subset of the event stream. As illustrated below, the stream
is split into “buckets” where functions can be applied.
-
+
The definition of a data window for a function involves two policies:
* Eviction policy: Controls the amount of data collected in a window.
@@ -168,9 +168,6 @@ Both trigger policy and eviction policy are driven by
either time or count.
Both processing time and event time are supported.
* Processing time is defined based on the wall time when the function
instance builds and processes a window. The judging of window completeness is
straightforward and you don’t have to worry about data arrival disorder.
* Event time is defined based on the timestamps that come with the event
record. It guarantees event time correctness but also offers more data
buffering and a limited completeness guarantee.
-
-Delivery Semantic Guarantees.
- * Currently, window function does not support `MANUAL` and
`Effectively-once` delivery semantics.
:::
@@ -186,11 +183,11 @@ Tumbling window assigns elements to a window of a
specified time length or count
In a tumbling window with a count-based trigger policy, as illustrated in the
following example, the trigger policy is set to 2. Each function is triggered
and executed when two items are in the window, regardless of the time.
-
+
In contrast, as illustrated in the following example, the window length of the
tumbling window is 10 seconds, which means the function is triggered when the
10-second time interval has elapsed, regardless of how many events are in the
window.
-
+
#### Sliding window
@@ -198,4 +195,4 @@ The sliding window method defines a fixed window length by
setting the eviction
As illustrated in the following example, the window length is 2 seconds, which
means that any data older than 2 seconds will be evicted and not used in the
computation. The sliding interval is configured to be 1 second, which means
that function is executed every second to process the data within the entire
window length.
-
\ No newline at end of file
+
\ No newline at end of file
diff --git a/site2/docs/functions-overview.md b/site2/docs/functions-overview.md
index 03cbd123127..59ab123f3d9 100644
--- a/site2/docs/functions-overview.md
+++ b/site2/docs/functions-overview.md
@@ -5,7 +5,7 @@ sidebar_label: "Overview"
---
This section introduces the following content:
-* [What is Pulsar Functions](#what-is-pulsar-functions)
+* [What are Pulsar Functions](#what-are-pulsar-functions)
* [Why use Puslar Functions](#why-use-pulsar-functions)
* [Use cases](#use-cases)
* [User flow](#user-flow)
@@ -18,18 +18,18 @@ Pulsar Functions are a serverless computing framework that
runs on top of Pulsar
* applies a user-defined processing logic to the messages,
* publishes the outputs of the messages to other topics.
-The following figure illustrates the computing process of a function.
+The diagram below illustrates the three steps in the functions computing
process.

-A function receives messages from one or more **input topics**. Each time
messages are received, the function completes the following steps:
-1. Consumes the messages in the input topics.
-2. Applies a customized processing logic to the messages and:
+Each time a function receives a message, it completes the following
consume-apply-publish steps.
+1. Consumes the message from one or more **input topics**.
+2. Applies the customized (user-supplied) processing logic to the message.
+3. Publishes the output of the message, including:
a) writes output messages to an **output topic** in Pulsar
- b) writes logs to a **log topic** if it is configured (for debugging
purposes)
- c) writes [state](functions-develop-state.md) to BookKeeper (if it is
configured)
-
-
+ b) writes logs to a **log topic** (if it is configured)for debugging
+ c) writes [state](functions-develop-state.md) updates to BookKeeper (if it
is configured)
+
You can write functions in Java, Python, and Go. For example, you can use
Pulsar Functions to set up the following processing chain:
* A Python function listens for the `raw-sentences` topic and "sanitizes"
incoming strings (removing extraneous white space and converting all characters
to lowercase) and then publishes the results to a `sanitized-sentences` topic.
* A Java function listens for the `sanitized-sentences` topic, counts the
number of times each word appears within a specified time
[window](functions-concepts.md#window-function), and publishes the results to a
`results` topic.
@@ -40,47 +40,47 @@ See [Develop Pulsar Functions](functions-develop.md) for
more details.
## Why use Pulsar Functions
-Pulsar Functions provide the capabilities to perform simple computations on
the messages before they are routed to consumers.
+Pulsar Functions perform simple computations on messages before routing the
messages to consumers. These Lambda-style functions are specifically designed
and integrated with Pulsar. The framework provides a simple computing framework
on your Pulsar cluster and takes care of the underlying details of sending and
receiving messages. You only need to focus on the business logic.
-Pulsar Functions can be characterized as Lambda-style functions that are
specifically designed and integrated with Pulsar as the underlying message bus.
The framework of Pulsar Functions provides a simple computing framework on your
Pulsar cluster and takes care of the underlying details of sending/receiving
messages. You only need to focus on the business logic and run it as Pulsar
Functions to maximize the value of your data and enjoy the benefits of:
+Pulsar Functions enable your organization to maximize the value of your data
and enjoy the benefits of:
* Simplified deployment and operations - you can create a data pipeline
without deploying a separate Stream Processing Engine (SPE), such as [Apache
Storm](http://storm.apache.org/), [Apache
Heron](https://heron.incubator.apache.org/), or [Apache
Flink](https://flink.apache.org/).
-* Serverless computing (when Kubernetes runtime is used)
+* Serverless computing (when you use Kubernetes runtime)
* Maximized developer productivity (both language-native interfaces and SDKs
for Java/Python/Go).
* Easy troubleshooting
-
## Use cases
-Here are two real-world use cases to help you understand the capabilities of
Pulsar Functions and what they can be used for.
+Below are two simple examples of use cases for Pulsar Functions.
### Word count example
-This figure illustrates the process of implementing the classic word count
example using Pulsar Functions. It calculates a sum of the occurrences of every
individual word published to a given topic.
+This figure shows the process of implementing the classic word count use case.

-### Content-based routing example
+In this example, the function calculates a sum of the occurrences of every
individual word published to a given topic.
-For example, a function takes items (strings) as input and publishes them to
either a `fruits` or `vegetables` topic, depending on the item. If an item is
neither fruit nor vegetable, a warning is logged to a [log
topic](functions-develop-log.md).
+### Content-based routing example
-This figure demonstrates the process of implementing a content-based routing
using Pulsar Functions.
+This figure demonstrates the process of implementing a content-based routing
use case.

-## User flow
+In this example, a function takes items (strings) as input and publishes them
to either a `fruits` or `vegetables` topic, depending on the item. If an item
is neither fruit nor vegetable, a warning is logged to a [log
topic](functions-develop-log.md).
-**Admins/operators**
-1. [Set up function workers](functions-worker.md).
-2. [Configure function runtime](functions-runtime.md).
-3. [Deploy a function](functions-deploy.md).
+## What's next?
-**Developers**
+* [Function concepts](functions-concepts.md)
+* [Function CLIs and configs](functions-cli.md)
+
+**For developers**
1. [Develop a function](functions-develop.md).
2. [Debug a function](functions-debug.md).
3. [Package a function](functions-package.md).
4. [Deploy a function](functions-deploy.md).
-**More reference**
-* [Function concepts](functions-concepts.md)
-* [Function CLIs and configs](functions-cli.md)
+**For admins/operators**
+1. [Set up function workers](functions-worker.md).
+2. [Configure function runtime](functions-runtime.md).
+3. [Deploy a function](functions-deploy.md).
diff --git a/site2/docs/functions-runtime-process.md
b/site2/docs/functions-runtime-process.md
index 2082c89f3a2..a5c7fe76385 100644
--- a/site2/docs/functions-runtime-process.md
+++ b/site2/docs/functions-runtime-process.md
@@ -22,3 +22,5 @@ functionRuntimeFactoryConfigs:
extraFunctionDependenciesDir:
```
+
+For more details, see
[code](https://github.com/apache/pulsar/blob/master/pulsar-functions/runtime/src/main/java/org/apache/pulsar/functions/runtime/process/ProcessRuntimeFactoryConfig.java).
\ No newline at end of file
diff --git a/site2/docs/functions-runtime-thread.md
b/site2/docs/functions-runtime-thread.md
index dce8a7fbfa2..edcb5973f59 100644
--- a/site2/docs/functions-runtime-thread.md
+++ b/site2/docs/functions-runtime-thread.md
@@ -4,7 +4,9 @@ title: Configure thread runtime
sidebar_label: "Configure thread runtime"
---
-You can use the default configurations of thread runtime in the
`conf/functions_worker.yml` file. If you want to customize parameters, such as
thread group name, refer to the following example.
+You can use the default configurations of thread runtime in the
`conf/functions_worker.yml` file.
+
+If you want to customize more parameters, such as thread group name, refer to
the following example.
```yaml
@@ -31,4 +33,6 @@ functionRuntimeFactoryConfigs:
If `absoluteValue` and `percentOfMaxDirectMemory` are both set, the smaller
value is used.
-:::
\ No newline at end of file
+:::
+
+For more details, see
[code](https://github.com/apache/pulsar/blob/master/pulsar-functions/runtime/src/main/java/org/apache/pulsar/functions/runtime/thread/ThreadRuntimeFactoryConfig.java).
\ No newline at end of file
diff --git a/site2/docs/io-overview.md b/site2/docs/io-overview.md
index 04b096de709..fe132c598ef 100644
--- a/site2/docs/io-overview.md
+++ b/site2/docs/io-overview.md
@@ -159,5 +159,5 @@ For more information about the options of `pulsar-admin
sinks update`, see [here
You can manage Pulsar connectors (for example, create, update, start, stop,
restart, reload, delete and perform other operations on connectors) via the
`Connector Admin CLI` with sources and sinks subcommands. For the latest and
complete information, see [Pulsar admin docs](/tools/pulsar-admin/).
-Connectors (sources and sinks) and Functions are components of instances, and
they all run on Functions workers. When managing a source, sink or function via
the `Connector Admin CLI` or `Functions Admin CLI`, an instance is started on a
worker. For more information, see [Functions
worker](functions-worker-run-separately.md).
+Connectors (sources and sinks) and Functions are components of instances, and
they all run on Functions workers. When managing a source, sink or function via
the `Connector Admin CLI` or `Functions Admin CLI`, an instance is started on a
worker. For more information, see [Functions worker](functions-worker.md).