This is an automated email from the ASF dual-hosted git repository.
mmerli pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-pulsar.git
The following commit(s) were added to refs/heads/master by this push:
new 125ca58 Pulsar Functions diagrams (#1502)
125ca58 is described below
commit 125ca585da6b02ae35c9681ab96aa302f9efdaa0
Author: Luc Perkins <[email protected]>
AuthorDate: Fri Apr 6 11:53:57 2018 -0700
Pulsar Functions diagrams (#1502)
* Add brief section to overview on processing guarantees
* add new sections to overview
* add basic PF diagram
* add word count diagram
* add two new examples to overview
* switch from python to java example
* Update description of counters
---
site/_sass/_docs.scss | 2 +-
site/docs/latest/functions/guarantees.md | 2 +-
site/docs/latest/functions/overview.md | 139 ++++++++++++++++++++++++--
site/img/pulsar-functions-overview.png | Bin 0 -> 77077 bytes
site/img/pulsar-functions-routing-example.png | Bin 0 -> 62087 bytes
site/img/pulsar-functions-word-count.png | Bin 0 -> 85250 bytes
6 files changed, 133 insertions(+), 10 deletions(-)
diff --git a/site/_sass/_docs.scss b/site/_sass/_docs.scss
index 42277f0..a66bd16 100644
--- a/site/_sass/_docs.scss
+++ b/site/_sass/_docs.scss
@@ -195,7 +195,7 @@
border-bottom: 1px solid black;
tr th {
- padding-right: $table-right-padding;
+ padding: 0 $table-right-padding .5rem 0;
}
}
diff --git a/site/docs/latest/functions/guarantees.md
b/site/docs/latest/functions/guarantees.md
index 541ac55..f6adc31 100644
--- a/site/docs/latest/functions/guarantees.md
+++ b/site/docs/latest/functions/guarantees.md
@@ -10,7 +10,7 @@ Delivery semantics | Description
:------------------|:-------
**At-most-once** delivery | Each message that is sent to the function will
most likely be processed but also may not be (hence the "at most")
**At-least-once** delivery | Each message that is sent to the function could
be processed more than once (hence the "at least")
-**Effectively-once** delivery | Each message that is sent to the function will
have one output associated with it. The function may be invoked more than once,
perhaps due to some kind of system failure, but the function will produce one
effect for each incoming message.
+**Effectively-once** delivery | Each message that is sent to the function will
have one output associated with it
## Applying processing guarantees to a function
diff --git a/site/docs/latest/functions/overview.md
b/site/docs/latest/functions/overview.md
index 764515d..9629856 100644
--- a/site/docs/latest/functions/overview.md
+++ b/site/docs/latest/functions/overview.md
@@ -10,7 +10,7 @@ preview: true
* apply a user-supplied processing logic to each message,
* publish the results of the computation to another topic
-Here's an example Pulsar Function for Java:
+Here's an example Pulsar Function for Java (using the [native
interface](../api#java-native)):
```java
import java.util.Function;
@@ -27,9 +27,9 @@ Functions are executed each time a message is published to
the input topic. If a
The core goal behind Pulsar Functions is to enable you to easily create
processing logic of any level of complexity without needing to deploy a
separate neighboring system (such as [Apache Storm](http://storm.apache.org/),
[Apache Heron](https://apache.github.io/incubator-heron), [Apache
Flink](https://flink.apache.org/), etc.). Pulsar Functions is essentially
ready-made compute infrastructure at your disposal as part of your Pulsar
messaging system. This core goal is tied to a series of [...]
-* Developer productive ([language-native](#native) vs. [Pulsar Functions
SDK](#sdk) functions)
-* easy troubleshooting
-* Operational simplicity (no need for an external system)
+* Developer productivity ([language-native](#native) vs. [Pulsar Functions
SDK](#sdk) functions)
+* Easy troubleshooting
+* Operational simplicity (no need for an external processing system)
## Inspirations
@@ -41,7 +41,98 @@ The Pulsar Functions feature was inspired by (and takes cues
from) several syste
Pulsar Functions could be described as
* [Lambda](https://aws.amazon.com/lambda/)-style functions that are
-* specifically designed to work with Pulsar
+* specifically designed to use Pulsar as a message bus
+
+## Programming model
+
+The core programming model behind Pulsar Functions is very simple:
+
+* Functions receive messages from one or more **input {% popover topics %}**.
Every time a message is received, the function can do a variety of things:
+ * Apply some processing logic to the input and write output to:
+ * An **output topic** in Pulsar
+ * [Apache BookKeeper](#state-storage)
+ * Write logs to a **log topic** (potentially for debugging purposes)
+ * Increment a [counter](#counters)
+
+
+
+### Word count example {#word-count}
+
+If you were to implement the classic word count example using Pulsar
Functions, it might look something like this:
+
+
+
+Here, sentences are produced on the `sentences` topic. The Pulsar Function
listens on that topic and whenever a message arrives it splits the sentence up
into individual words and increments a [counter](#counters) for each word every
time that word is encountered. The value of that counter is then available to
all [instances](#parallelism) of the function.
+
+If you were writing the function in [Java](../api#java) using the [Pulsar
Functions SDK for Java](../api#java-sdk), you could write the function like
this...
+
+```java
+package org.example.functions;
+
+import org.apache.pulsar.functions.api.Context;
+import org.apache.pulsar.functions.api.Function;
+
+import java.util.Arrays;
+
+public class WordCountFunction implements Function<String, Void> {
+ // This function is invoked every time a message is published to the input
topic
+ @Override
+ public Void process(String input, Context context) {
+ Arrays.asList(input.split(" ")).forEach(word -> {
+ String counterKey = word.toLowerCase();
+ context.incrCounter(counterKey, 1)
+ });
+ return null;
+ }
+}
+```
+
+...and then [deploy it](#cluster-mode) in your Pulsar cluster using the
[command line](#cli) like this:
+
+```bash
+$ bin/pulsar-admin functions create \
+ --jar target/my-jar-with-dependencies.jar \
+ --className org.example.functions.WordCountFunction \
+ --tenant sample \
+ --namespace ns1 \
+ --name word-count \
+ --inputs persistent://sample/standalone/ns1/sentences \
+ --output persistent://sample/standalone/ns1/count
+```
+
+### Content-based routing example {#content}
+
+The use cases for Pulsar Functions are essentially endless, but let's dig into
a more sophisticated example that involves content-based routing.
+
+Imagine a function that takes items (strings) as input and publishes them to
either a fruits or vegetables topic, depending on the item. Or, if an item is
neither a fruit nor a vegetable, a warning is logged to a [log
topic](#logging). Here's a visual representation:
+
+
+
+If you were implementing this routing functionality in Python, it might look
something like this:
+
+```python
+from pulsar import Function
+
+class RoutingFunction(Function):
+ def __init__(self):
+ self.fruits_topic = "persistent://sample/standalone/ns1/fruits"
+ self.vegetables_topic = "persistent://sample/standalone/ns1/vegetables"
+
+ def is_fruit(item):
+ return item in ["apple", "orange", "pear", "other fruits..."]
+
+ def is_vegetable(item):
+ return item in ["carrot", "lettuce", "radish", "other vegetables..."]
+
+ def process(self, item, context):
+ if self.is_fruit(item):
+ context.publish(self.fruits_topic, item)
+ elif self.is_vegetable(item):
+ context.publish(self.vegetables_topic, item)
+ else:
+ warning = "The item {0} is neither a fruit nor a
vegetable".format(item)
+ context.get_logger().warn(warning)
+```
## Command-line interface {#cli}
@@ -88,6 +179,12 @@ You can also mix and match configuration methods by
specifying some function att
Pulsar Functions can currently be written in [Java](../../functions/api#java)
and [Python](../../functions/api#python). Support for additional languages is
coming soon.
+## The Pulsar Functions API {#api}
+
+* Type safe (bytes versus specific types)
+* SerDe (built-in vs. custom)
+* Pulsar messages are always just bytes, but Pulsar Functions handles data
types for you *unless* you need custom types
+
## Function context {#context}
Each Pulsar Function created using the [Pulsar Functions SDK](#sdk) has access
to a context object that both provides:
@@ -95,7 +192,10 @@ Each Pulsar Function created using the [Pulsar Functions
SDK](#sdk) has access t
1. A wide variety of information about the function, including:
* The name of the function
* The {% popover tenant %} and {% popover namespace %} of the function
- * [User-supplied configuration]() values
+ * [User-supplied configuration](#user-config) values
+2. Special functionality, including:
+ * The ability to produce [logs](#logging) to a specified logging topic
+ * The ability to produce [metrics](#metrics)
### Language-native functions {#native}
@@ -103,7 +203,7 @@ Both Java and Python support writing "native" functions,
i.e. Pulsar Functions w
The benefit of native functions is that they don't have any dependencies
beyond what's already available in Java/Python "out of the box." The downside
is that they don't provide access to the function's [context](#context)
-### The Pulsar Functions SDK {#sdk}
+## The Pulsar Functions SDK {#sdk}
If you'd like a Pulsar Function to have access to a [context
object](#context), you can use the Pulsar Functions SDK, available for both
[Java](../api#java-sdk) and [Pythnon](../api#python-sdk).
@@ -239,6 +339,29 @@ public class ConfigMapFunction implements Function<String,
Void> {
}
```
+## Processing guarantees {#guarantees}
+
+The Pulsar Functions feature provides three different messaging semantics that
you can apply to any function:
+
+Delivery semantics | Description
+:------------------|:-------
+**At-most-once** delivery | Each message that is sent to the function will
most likely be processed but also may not be (hence the "at most")
+**At-least-once** delivery | Each message that is sent to the function could
be processed more than once (hence the "at least")
+**Effectively-once** delivery | Each message that is sent to the function will
have one output associated with it
+
+This command, for example, would run a function in [cluster
mode](#cluster-mode) with effectively-once guarantees applied:
+
+```bash
+$ bin/pulsar-admin functions create \
+ --name my-effectively-once-function \
+ --processingGuarantees EFFECTIVELY_ONCE \
+ # Other function configs
+```
+
## Metrics
-Pulsar Functions that use the [Pulsar Functions SDK](#sdk) can publish metrics
to Pulsar. For more information, see [Metrics for Pulsar Functions](../metrics).
\ No newline at end of file
+Pulsar Functions that use the [Pulsar Functions SDK](#sdk) can publish metrics
to Pulsar. For more information, see [Metrics for Pulsar Functions](../metrics).
+
+## State storage
+
+Pulsar Functions use [Apache BookKeeper](https://bookkeeper.apache.org) as a
state storage interface. All Pulsar installations, including local {% popover
standalone %} installations, include a deployment of BookKeeper {% popover
bookies %}.
\ No newline at end of file
diff --git a/site/img/pulsar-functions-overview.png
b/site/img/pulsar-functions-overview.png
new file mode 100644
index 0000000..065046b
Binary files /dev/null and b/site/img/pulsar-functions-overview.png differ
diff --git a/site/img/pulsar-functions-routing-example.png
b/site/img/pulsar-functions-routing-example.png
new file mode 100644
index 0000000..27a1c44
Binary files /dev/null and b/site/img/pulsar-functions-routing-example.png
differ
diff --git a/site/img/pulsar-functions-word-count.png
b/site/img/pulsar-functions-word-count.png
new file mode 100644
index 0000000..ad0c280
Binary files /dev/null and b/site/img/pulsar-functions-word-count.png differ
--
To stop receiving notification emails like this one, please contact
[email protected].