[pulsar] branch master updated: [improve][doc] Functions doc improvements Phase2 (#16172)

liuyu Thu, 18 Aug 2022 20:16:36 -0700

This is an automated email from the ASF dual-hosted git repository.

liuyu pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/pulsar.git



The following commit(s) were added to refs/heads/master by this push:
     new 5da9be9f558 [improve][doc] Functions doc improvements Phase2 (#16172)
5da9be9f558 is described below

commit 5da9be9f5582256eea4e5b62d476c5f038eea0bf
Author: momo-jun <[email protected]>
AuthorDate: Fri Aug 19 11:16:20 2022 +0800

    [improve][doc] Functions doc improvements Phase2 (#16172)
---
 .../function-count-based-tumbling-window.png       | Bin 156363 -> 0 bytes
 .../function-count-based-tumbling-window.svg       |   1 +
 site2/docs/assets/function-data-window.png         | Bin 334827 -> 0 bytes
 site2/docs/assets/function-data-window.svg         |   1 +
 site2/docs/assets/function-sliding-window.png      | Bin 150032 -> 0 bytes
 site2/docs/assets/function-sliding-window.svg      |   1 +
 .../assets/function-time-based-tumbling-window.png | Bin 150774 -> 0 bytes
 .../assets/function-time-based-tumbling-window.svg |   1 +
 site2/docs/functions-cli.md                        |   2 +-
 site2/docs/functions-concepts.md                   |  21 ++++----
 site2/docs/functions-overview.md                   |  54 ++++++++++-----------
 site2/docs/functions-runtime-process.md            |   2 +
 site2/docs/functions-runtime-thread.md             |   8 ++-
 site2/docs/io-overview.md                          |   2 +-
 14 files changed, 50 insertions(+), 43 deletions(-)

diff --git a/site2/docs/assets/function-count-based-tumbling-window.png 
b/site2/docs/assets/function-count-based-tumbling-window.png
deleted file mode 100644
index c5ad6b613c5..00000000000
Binary files a/site2/docs/assets/function-count-based-tumbling-window.png and 
/dev/null differ
diff --git a/site2/docs/assets/function-count-based-tumbling-window.svg 
b/site2/docs/assets/function-count-based-tumbling-window.svg
new file mode 100644
index 00000000000..1d5d104d5dd
--- /dev/null
+++ b/site2/docs/assets/function-count-based-tumbling-window.svg
@@ -0,0 +1 @@
+<svg xmlns="http://www.w3.org/2000/svg"; 
xmlns:xlink="http://www.w3.org/1999/xlink"; xmlns:lucid="lucid" width="909.87" 
height="343.91"><g transform="translate(-17.333333333294433 
-456.5918872073414)" lucid:page-tab-id="0_0"><path d="M0 0h1870.87v1322.83H0z" 
fill="#fff"/><path d="M195.33 569.26a6 6 0 0 1 6-6h102.25a6 6 0 0 1 6 6v92.17a6 
6 0 0 1-6 6H201.33a6 6 0 0 1-6-6z" fill="#fff"/><path d="M199.04 
563.74l.44-.18.9-.22h1.03m4.05 0h2.03m4.04 0h2.03m4.04-.02h2.02m4.04 
0h2.03m4.03 0h2.03m4. [...]
\ No newline at end of file
diff --git a/site2/docs/assets/function-data-window.png 
b/site2/docs/assets/function-data-window.png
deleted file mode 100644
index 45c3cfca894..00000000000
Binary files a/site2/docs/assets/function-data-window.png and /dev/null differ
diff --git a/site2/docs/assets/function-data-window.svg 
b/site2/docs/assets/function-data-window.svg
new file mode 100644
index 00000000000..ed2c5ac53eb
--- /dev/null
+++ b/site2/docs/assets/function-data-window.svg
@@ -0,0 +1 @@
+<svg xmlns="http://www.w3.org/2000/svg"; 
xmlns:xlink="http://www.w3.org/1999/xlink"; xmlns:lucid="lucid" width="1196.89" 
height="438.72"><g transform="translate(-61.99999999996112 -362.8796296530241)" 
lucid:page-tab-id="0_0"><path d="M0 0h1870.87v1322.83H0z" fill="#fff"/><path 
d="M254 505.43a6 6 0 0 1 6-6h844.98a6 6 0 0 1 6 6v156a6 6 0 0 1-6 6H260a6 6 0 0 
1-6-6z" fill="#fff"/><path d="M255 505.43c0 .56-.45 1-1 1s-1-.44-1-1c0-.55.45-1 
1-1s1 .45 1 1zm1.45-3.88c0 .55-.45 1-1 1s-1-.45-1-1 .45- [...]
\ No newline at end of file
diff --git a/site2/docs/assets/function-sliding-window.png 
b/site2/docs/assets/function-sliding-window.png
deleted file mode 100644
index bf66a761819..00000000000
Binary files a/site2/docs/assets/function-sliding-window.png and /dev/null 
differ
diff --git a/site2/docs/assets/function-sliding-window.svg 
b/site2/docs/assets/function-sliding-window.svg
new file mode 100644
index 00000000000..efdac727165
--- /dev/null
+++ b/site2/docs/assets/function-sliding-window.svg
@@ -0,0 +1 @@
+<svg xmlns="http://www.w3.org/2000/svg"; 
xmlns:xlink="http://www.w3.org/1999/xlink"; xmlns:lucid="lucid" width="998.67" 
height="397.78"><g transform="translate(-566.6666666666472 
-336.30936180364023)" lucid:page-tab-id="0_0"><path d="M0 0h1870.87v1322.83H0z" 
fill="#fff"/><path d="M751.92 517.84h40v60h-40z" stroke="#474e55" 
stroke-width="2" fill="#188fff"/><g fill="none"><path d="M1196.67 
363.4v79.68h-458V363.4z"/><path d="M738.67 443.08c0-15.8 12.8-28.62 
28.62-28.62h171.74c15.8 0 28.63-12. [...]
\ No newline at end of file
diff --git a/site2/docs/assets/function-time-based-tumbling-window.png 
b/site2/docs/assets/function-time-based-tumbling-window.png
deleted file mode 100644
index 610347e2e5b..00000000000
Binary files a/site2/docs/assets/function-time-based-tumbling-window.png and 
/dev/null differ
diff --git a/site2/docs/assets/function-time-based-tumbling-window.svg 
b/site2/docs/assets/function-time-based-tumbling-window.svg
new file mode 100644
index 00000000000..4ef6660fdc8
--- /dev/null
+++ b/site2/docs/assets/function-time-based-tumbling-window.svg
@@ -0,0 +1 @@
+<svg xmlns="http://www.w3.org/2000/svg"; 
xmlns:xlink="http://www.w3.org/1999/xlink"; xmlns:lucid="lucid" width="1196.89" 
height="364.2"><g transform="translate(-61.99999999996112 -437.3961096847811)" 
lucid:page-tab-id="0_0"><path d="M0 0h1870.87v1322.83H0z" fill="#fff"/><path 
d="M234 569.26a6 6 0 0 1 6-6h435.92a6 6 0 0 1 6 6v92.17a6 6 0 0 1-6 6H240a6 6 0 
0 1-6-6z" fill="#fff"/><path d="M237.7 563.74l.45-.18.9-.22h1m4 0h2m3.98 0h2m4 
0H258m4 0h2m4 0h1.98m4 0h1.98m4 0h2m3.98 0h2m3.98 0h2m4 0h [...]
\ No newline at end of file
diff --git a/site2/docs/functions-cli.md b/site2/docs/functions-cli.md
index c2c4f3ee9fa..42bd38dc310 100644
--- a/site2/docs/functions-cli.md
+++ b/site2/docs/functions-cli.md
@@ -46,7 +46,7 @@ You can configure a function by using a predefined YAML file. 
The following tabl
 | userConfig           | Map`<String,Object>`         | `--user-config`        
         | User-defined config key/values. |
 | secrets       | Map`<String,Object>` | `--secrets`   | The mapping from 
secretName to objects that encapsulate how the secret is fetched by the 
underlying secrets provider. |
 | runtime       | String             | N/A          | The runtime of a 
function. Available values: `java`,`python`, `go`. |
-| autoAck       | Boolean            | `--auto-ack` | Whether the framework 
acknowledges messages automatically or not. <br /><br />**Tip**: This 
configuration will be deprecated. If the user specifies delivery semantics, the 
framework will automatically ack messages. If you do not want the framework to 
ack messages, set the **processingGuarantees** to `MANUAL`. |
+| autoAck       | Boolean            | `--auto-ack` | Whether the framework 
acknowledges messages automatically or not. <br /><br />**Note**: This 
configuration will be deprecated in future releases. If you specify a delivery 
semantic, the framework automatically acknowledges messages. If you do not want 
the framework to auto-ack messages, set the `processingGuarantees` to `MANUAL`. 
|
 | maxMessageRetries    | Int      |    `--max-message-retries` | The number of 
retries to process a message before giving up. |
 | deadLetterTopic      | String   | `--dead-letter-topic`   | The topic used 
for storing messages that are not processed successfully. |
 | subName              | String   | `--subs-name`           | The name of 
Pulsar source subscription used for input-topic consumers if required.|
diff --git a/site2/docs/functions-concepts.md b/site2/docs/functions-concepts.md
index cfbba46aaf2..4f40f8ec7be 100644
--- a/site2/docs/functions-concepts.md
+++ b/site2/docs/functions-concepts.md
@@ -75,10 +75,10 @@ Pulsar provides three different messaging delivery 
semantics that you can apply
 
 | Delivery semantics | Description | Adopted subscription type |
 |--------------------|-------------|---------------------------|
-| **At-most-once** delivery | Each message sent to a function is processed at 
its best effort. There’s no guarantee that the message will be processed or 
not. <br /><br /> When setting At-most-once, the `autoAck` configuration must 
be equal to true, otherwise the startup will fail(`autoAck` configuration will 
be deprecated in future releases). <br/><br/> **Ack time node**: Before 
function processing. | Shared |
-| **At-least-once** delivery (default) | Each message sent to the function can 
be processed more than once (in case of a processing failure or redelivery).<br 
/><br />If you create a function without specifying the 
`--processing-guarantees` flag, the function provides `at-least-once` delivery 
guarantee. <br/><br/> **Ack time node**: After sending a message to output. | 
Shared |
-| **Effectively-once** delivery | Each message sent to the function can be 
processed more than once but it has only one output. Duplicated messages are 
ignored.<br /><br />`Effectively once` is achieved on top of `at-least-once` 
processing and guaranteed server-side deduplication. This means a state update 
can happen twice, but the same state update is only applied once, the other 
duplicated state update is discarded on the server-side. <br/><br/> **Ack time 
node**: After sending a messa [...]
-| **Manual** delivery | Under this semantics, the user needs to call the 
method `context.getCurrentRecord().ack()` inside the function to manually 
perform the ack operation, and the framework will not help users to do any ack 
operations. <br/><br/> **Ack time node**: User decides, in function method. | 
Shared |
+| **At-most-once** delivery | Each message sent to a function is processed at 
its best effort. There’s no guarantee that the message will be processed or 
not. <br /><br /> When you select this semantic, the `autoAck` configuration 
must be set to `true`, otherwise the startup will fail (the `autoAck` 
configuration will be deprecated in future releases). <br /><br /> **Ack time 
node**: Before function processing. | Shared |
+| **At-least-once** delivery (default) | Each message sent to a function can 
be processed more than once (in case of a processing failure or redelivery).<br 
/><br />If you create a function without specifying the 
`--processing-guarantees` flag, the function provides `at-least-once` delivery 
guarantee. <br /><br /> **Ack time node**: After sending a message to output. | 
Shared |
+| **Effectively-once** delivery | Each message sent to a function can be 
processed more than once but it has only one output. Duplicated messages are 
ignored.<br /><br />`Effectively once` is achieved on top of `at-least-once` 
processing and guaranteed server-side deduplication. This means a state update 
can happen twice, but the same state update is only applied once, the other 
duplicated state update is discarded on the server-side. <br /><br /> **Ack 
time node**: After sending a messa [...]
+| **Manual** delivery | When you select this semantic, the framework does not 
perform any ack operations, and you need to call the method 
`context.getCurrentRecord().ack()` inside a function to manually perform the 
ack operation. <br /><br /> **Ack time node**: User-defined within function 
methods. | Shared |
 
 
 :::tip
@@ -149,13 +149,13 @@ Pulsar Functions take byte arrays as inputs and spit out 
byte arrays as output.
 
 :::note    
 
-Currently, window function is only available in Java.
+Currently, window function is only available in Java, and does not support 
`MANUAL` and  `Effectively-once` delivery semantics.
 
 :::
 
 Window function is a function that performs computation across a data window, 
that is, a finite subset of the event stream. As illustrated below, the stream 
is split into “buckets” where functions can be applied.
 
-![A window of data within an event stream](/assets/function-data-window.png)
+![A window of data within an event stream](/assets/function-data-window.svg)
 
 The definition of a data window for a function involves two policies:
 * Eviction policy: Controls the amount of data collected in a window. 
@@ -168,9 +168,6 @@ Both trigger policy and eviction policy are driven by 
either time or count.
 Both processing time and event time are supported.
  * Processing time is defined based on the wall time when the function 
instance builds and processes a window. The judging of window completeness is 
straightforward and you don’t have to worry about data arrival disorder. 
  * Event time is defined based on the timestamps that come with the event 
record. It guarantees event time correctness but also offers more data 
buffering and a limited completeness guarantee.
-
-Delivery Semantic Guarantees.
- * Currently, window function does not support `MANUAL` and  
`Effectively-once` delivery semantics.
    
 :::
 
@@ -186,11 +183,11 @@ Tumbling window assigns elements to a window of a 
specified time length or count
 
 In a tumbling window with a count-based trigger policy, as illustrated in the 
following example, the trigger policy is set to 2. Each function is triggered 
and executed when two items are in the window, regardless of the time. 
 
-![A tumbling window with a count-based trigger 
policy](/assets/function-count-based-tumbling-window.png)
+![A tumbling window with a count-based trigger 
policy](/assets/function-count-based-tumbling-window.svg)
 
 In contrast, as illustrated in the following example, the window length of the 
tumbling window is 10 seconds, which means the function is triggered when the 
10-second time interval has elapsed, regardless of how many events are in the 
window. 
 
-![A tumbling window with a time-based trigger 
policy](/assets/function-time-based-tumbling-window.png)
+![A tumbling window with a time-based trigger 
policy](/assets/function-time-based-tumbling-window.svg)
 
 #### Sliding window
 
@@ -198,4 +195,4 @@ The sliding window method defines a fixed window length by 
setting the eviction
 
 As illustrated in the following example, the window length is 2 seconds, which 
means that any data older than 2 seconds will be evicted and not used in the 
computation. The sliding interval is configured to be 1 second, which means 
that function is executed every second to process the data within the entire 
window length. 
 
-![Sliding window with an overlap](/assets/function-sliding-window.png)
\ No newline at end of file
+![Sliding window with an overlap](/assets/function-sliding-window.svg)
\ No newline at end of file
diff --git a/site2/docs/functions-overview.md b/site2/docs/functions-overview.md
index 03cbd123127..59ab123f3d9 100644
--- a/site2/docs/functions-overview.md
+++ b/site2/docs/functions-overview.md
@@ -5,7 +5,7 @@ sidebar_label: "Overview"
 ---
 
 This section introduces the following content:
-* [What is Pulsar Functions](#what-is-pulsar-functions)
+* [What are Pulsar Functions](#what-are-pulsar-functions)
 * [Why use Puslar Functions](#why-use-pulsar-functions)
 * [Use cases](#use-cases)
 * [User flow](#user-flow)
@@ -18,18 +18,18 @@ Pulsar Functions are a serverless computing framework that 
runs on top of Pulsar
 * applies a user-defined processing logic to the messages,
 * publishes the outputs of the messages to other topics.
 
-The following figure illustrates the computing process of a function. 
+The diagram below illustrates the three steps in the functions computing 
process. 
 
 ![Pulsar Functions execute user-defined code on data published to Pulsar 
topics](/assets/function-overview.svg)
 
-A function receives messages from one or more **input topics**. Each time 
messages are received, the function completes the following steps:
-1. Consumes the messages in the input topics.
-2. Applies a customized processing logic to the messages and:
+Each time a function receives a message, it completes the following 
consume-apply-publish steps.
+1. Consumes the message from one or more **input topics**. 
+2. Applies the customized (user-supplied) processing logic to the message.
+3. Publishes the output of the message, including:
     a) writes output messages to an **output topic** in Pulsar
-    b) writes logs to a **log topic** if it is configured (for debugging 
purposes)
-    c) writes [state](functions-develop-state.md) to BookKeeper (if it is 
configured) 
-
-
+    b) writes logs to a **log topic** (if it is configured)for debugging
+    c) writes [state](functions-develop-state.md) updates to BookKeeper (if it 
is configured) 
+    
 You can write functions in Java, Python, and Go. For example, you can use 
Pulsar Functions to set up the following processing chain:
 * A Python function listens for the `raw-sentences` topic and "sanitizes" 
incoming strings (removing extraneous white space and converting all characters 
to lowercase) and then publishes the results to a `sanitized-sentences` topic.
 * A Java function listens for the `sanitized-sentences` topic, counts the 
number of times each word appears within a specified time 
[window](functions-concepts.md#window-function), and publishes the results to a 
`results` topic.
@@ -40,47 +40,47 @@ See [Develop Pulsar Functions](functions-develop.md) for 
more details.
 
 ## Why use Pulsar Functions
 
-Pulsar Functions provide the capabilities to perform simple computations on 
the messages before they are routed to consumers. 
+Pulsar Functions perform simple computations on messages before routing the 
messages to consumers. These Lambda-style functions are specifically designed 
and integrated with Pulsar. The framework provides a simple computing framework 
on your Pulsar cluster and takes care of the underlying details of sending and 
receiving messages. You only need to focus on the business logic.
 
-Pulsar Functions can be characterized as Lambda-style functions that are 
specifically designed and integrated with Pulsar as the underlying message bus. 
The framework of Pulsar Functions provides a simple computing framework on your 
Pulsar cluster and takes care of the underlying details of sending/receiving 
messages. You only need to focus on the business logic and run it as Pulsar 
Functions to maximize the value of your data and enjoy the benefits of:
+Pulsar Functions enable your organization to maximize the value of your data 
and enjoy the benefits of:
 * Simplified deployment and operations - you can create a data pipeline 
without deploying a separate Stream Processing Engine (SPE), such as [Apache 
Storm](http://storm.apache.org/), [Apache 
Heron](https://heron.incubator.apache.org/), or [Apache 
Flink](https://flink.apache.org/).
-* Serverless computing (when Kubernetes runtime is used)
+* Serverless computing (when you use Kubernetes runtime)
 * Maximized developer productivity (both language-native interfaces and SDKs 
for Java/Python/Go).
 * Easy troubleshooting
 
-
 ## Use cases
 
-Here are two real-world use cases to help you understand the capabilities of 
Pulsar Functions and what they can be used for.
+Below are two simple examples of use cases for Pulsar Functions.
 
 ### Word count example
 
-This figure illustrates the process of implementing the classic word count 
example using Pulsar Functions. It calculates a sum of the occurrences of every 
individual word published to a given topic.
+This figure shows the process of implementing the classic word count use case.
 
 ![Word count example using Pulsar 
Functions](/assets/pulsar-functions-word-count.png)
 
-### Content-based routing example
+In this example, the function calculates a sum of the occurrences of every 
individual word published to a given topic.
 
-For example, a function takes items (strings) as input and publishes them to 
either a `fruits` or `vegetables` topic, depending on the item. If an item is 
neither fruit nor vegetable, a warning is logged to a [log 
topic](functions-develop-log.md).
+### Content-based routing example
 
-This figure demonstrates the process of implementing a content-based routing 
using Pulsar Functions. 
+This figure demonstrates the process of implementing a content-based routing 
use case. 
 
 ![Count-based routing example using Pulsar 
Functions](/assets/pulsar-functions-routing-example.png)
 
-## User flow
+In this example, a function takes items (strings) as input and publishes them 
to either a `fruits` or `vegetables` topic, depending on the item. If an item 
is neither fruit nor vegetable, a warning is logged to a [log 
topic](functions-develop-log.md).
 
-**Admins/operators**
-1. [Set up function workers](functions-worker.md).
-2. [Configure function runtime](functions-runtime.md). 
-3. [Deploy a function](functions-deploy.md).
+## What's next?
 
-**Developers**
+* [Function concepts](functions-concepts.md)
+* [Function CLIs and configs](functions-cli.md)
+
+**For developers**
 1. [Develop a function](functions-develop.md).
 2. [Debug a function](functions-debug.md).
 3. [Package a function](functions-package.md).
 4. [Deploy a function](functions-deploy.md).
 
-**More reference**
-* [Function concepts](functions-concepts.md)
-* [Function CLIs and configs](functions-cli.md)
+**For admins/operators**
+1. [Set up function workers](functions-worker.md).
+2. [Configure function runtime](functions-runtime.md). 
+3. [Deploy a function](functions-deploy.md).
 
diff --git a/site2/docs/functions-runtime-process.md 
b/site2/docs/functions-runtime-process.md
index 2082c89f3a2..a5c7fe76385 100644
--- a/site2/docs/functions-runtime-process.md
+++ b/site2/docs/functions-runtime-process.md
@@ -22,3 +22,5 @@ functionRuntimeFactoryConfigs:
   extraFunctionDependenciesDir:
 
 ```
+
+For more details, see 
[code](https://github.com/apache/pulsar/blob/master/pulsar-functions/runtime/src/main/java/org/apache/pulsar/functions/runtime/process/ProcessRuntimeFactoryConfig.java).
\ No newline at end of file
diff --git a/site2/docs/functions-runtime-thread.md 
b/site2/docs/functions-runtime-thread.md
index dce8a7fbfa2..edcb5973f59 100644
--- a/site2/docs/functions-runtime-thread.md
+++ b/site2/docs/functions-runtime-thread.md
@@ -4,7 +4,9 @@ title: Configure thread runtime
 sidebar_label: "Configure thread runtime"
 ---
 
-You can use the default configurations of thread runtime in the 
`conf/functions_worker.yml` file. If you want to customize parameters, such as 
thread group name, refer to the following example.
+You can use the default configurations of thread runtime in the 
`conf/functions_worker.yml` file. 
+
+If you want to customize more parameters, such as thread group name, refer to 
the following example.
 
 ```yaml
 
@@ -31,4 +33,6 @@ functionRuntimeFactoryConfigs:
 
 If `absoluteValue` and `percentOfMaxDirectMemory` are both set, the smaller 
value is used.
 
-:::
\ No newline at end of file
+:::
+
+For more details, see 
[code](https://github.com/apache/pulsar/blob/master/pulsar-functions/runtime/src/main/java/org/apache/pulsar/functions/runtime/thread/ThreadRuntimeFactoryConfig.java).
\ No newline at end of file
diff --git a/site2/docs/io-overview.md b/site2/docs/io-overview.md
index 04b096de709..fe132c598ef 100644
--- a/site2/docs/io-overview.md
+++ b/site2/docs/io-overview.md
@@ -159,5 +159,5 @@ For more information about the options of `pulsar-admin 
sinks update`, see [here
 
 You can manage Pulsar connectors (for example, create, update, start, stop, 
restart, reload, delete and perform other operations on connectors) via the 
`Connector Admin CLI` with sources and sinks subcommands. For the latest and 
complete information, see [Pulsar admin docs](/tools/pulsar-admin/).
 
-Connectors (sources and sinks) and Functions are components of instances, and 
they all run on Functions workers. When managing a source, sink or function via 
the `Connector Admin CLI` or `Functions Admin CLI`, an instance is started on a 
worker. For more information, see [Functions 
worker](functions-worker-run-separately.md).
+Connectors (sources and sinks) and Functions are components of instances, and 
they all run on Functions workers. When managing a source, sink or function via 
the `Connector Admin CLI` or `Functions Admin CLI`, an instance is started on a 
worker. For more information, see [Functions worker](functions-worker.md).

[pulsar] branch master updated: [improve][doc] Functions doc improvements Phase2 (#16172)

Reply via email to