sijie closed pull request #1826: [WIP] Pulsar Functions worker configuration
URL: https://github.com/apache/incubator-pulsar/pull/1826
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/site/_data/config/functions_worker.yaml 
b/site/_data/config/functions_worker.yaml
new file mode 100644
index 0000000000..9da02d05b1
--- /dev/null
+++ b/site/_data/config/functions_worker.yaml
@@ -0,0 +1,82 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+
+configs:
+- name: workerId
+  description: An identifier for the worker
+  default: standalone
+- name: workerHostname
+  description: The hostname used by the worker daemon
+  default: localhost
+- name: workerPort
+  description: The port used by the worker daemon
+  default: 6750
+- name: functionMetadataTopicName
+  description: The Pulsar topic used for worker daemon metadata transfer
+  default: metadata
+#- name: functionMetadataSnapshotsTopicPath
+#  description: TODO
+#  default: snapshots
+- name: clusterCoordinationTopicName
+  description: The Pulsar topic used for worker daemon cluster coordination
+  default: coordinate
+- name: pulsarFunctionsNamespace
+  description: The Pulsar namespace used for Pulsar-Functions-specific 
functionality
+  default: public/functions
+- name: pulsarFunctionsCluster
+  default: standalone
+- name: pulsarServiceUrl
+  description: The `pulsar` scheme URL for the Pulsar broker associated with 
the worker daemon
+  default: pulsar://localhost:6650
+- name: pulsarWebServiceUrl
+  description: The HTTP service URL for the Pulsar broker associated with the 
worker daemon
+  default: http://localhost:8080
+- name: numFunctionPackageReplicas
+  description: The number of replicas of the function package (i.e. the code 
resources for the function) to store
+  default: 1
+- name: downloadDirectory
+  description: The directory in which function packages are downloaded
+  default: /tmp/pulsar_functions
+- name: processContainerFactory
+  description: Add this parameter (with an optional 
[`logDirectory`](#logDirectory) sub-parameter) if you'd like to use the 
process-based runtime for Pulsar Functions (each function instance is run in 
its own process). This is the default runtime.
+- name: logDirectory
+  description: Optional sub-parameter for the process-based runtime
+- name: threadContainerFactory
+  description: Add this parameter if you'd like to use the thread-based 
runtime for Pulsar Functions (each function instance is run in its own JVM 
thread). The process-based runtime is the default.
+- name: schedulerClassName
+  description: The Java class name for the worker daemon scheduler 
implementation
+  default: org.apache.pulsar.functions.worker.scheduler.RoundRobinScheduler
+- name: functionAssignmentTopicName
+  description: The Pulsar topic used for function-assignment-related tasks
+  default: assignments
+- name: failureCheckFreqMs
+  description: The frequency with which the daemon worker checks for failure 
(in milliseconds)
+  default: 30000
+- name: rescheduleTimeoutMs
+  description: The timeout applied to Pulsar Function reschedule operations 
(in milliseconds)
+  default: 60000
+- name: initialBrokerReconnectMaxRetries
+  description: The maximum allowed number of retries when initializing broker 
reconnect to the daemon worker
+  default: 60
+- name: assignmentWriteMaxRetries
+  description: The maximum allowed number of retries when attempting to assign 
functions
+  default: 60
+- name: instanceLivenessCheckFreqMs
+  description: The frequency with which the daemon worker checks the liveness 
of Pulsar Function instances
+  default: 30000
\ No newline at end of file
diff --git a/site/docs/latest/functions/deployment.md 
b/site/docs/latest/functions/deployment.md
index c0871bfa79..5ac5c80878 100644
--- a/site/docs/latest/functions/deployment.md
+++ b/site/docs/latest/functions/deployment.md
@@ -208,3 +208,64 @@ Pulsar supports three different [subscription 
types](../../getting-started/Conce
 
 Pulsar Functions can also be assigned a subscription type when you 
[create](#cluster-mode) them or run them [locally](#local-run). In cluster 
mode, the subscription can also be [updated](#updating) after the function has 
been created.
 -->
+
+## The Pulsar Functions worker {#worker}
+
+Deployment of Pulsar Functions is handled by a dedicated worker process that 
runs alongside the Pulsar {% popover broker %}. The Pulsar Functions worker is 
responsible for running [instances](#parallelism) of Pulsar Functions, starting 
them, stopping them, etc.
+
+### Execution runtimes
+
+The Pulsar Functions worker supports two available execution runtimes:
+
+* The [process-based](#process) runtime runs Pulsar Function 
[instances](#parallelism) as separate processes
+* The [thread-based](#thread) runtime runs Pulsar Function instances as 
separate [JVM 
threads](https://docs.oracle.com/javase/tutorial/essential/concurrency/procthread.html).
 Please note that the thread-based runtime is available *only* for 
[Java](../api#java) functions.
+
+You can select the runtime when you start up a Pulsar {% popover broker %} via 
the broker's [configuration](#config).
+
+{% include admonition.html type="success" title="Other runtimes" content="The 
process-based and thread-based runtimes for Pulsar Functions" %}
+
+##### Process-based runtime {#process}
+
+The process-based runtime for Pulsar Functions runs function 
[instances](#parallelism) in separate processes. For instructions on using the 
process-based runtime, see [below](#using-process).
+
+{% include admonition.html type="info" content="The processed-based runtime is 
the **default** for Pulsar Functions." %}
+
+#### Thread-based runtime {#thread}
+
+The thread-based runtime for Pulsar Functions runs function 
[instances](#parallelism) in separate 
[JVM](https://en.wikipedia.org/wiki/Java_virtual_machine) threads. For 
instructions on using the thread-based runtime, see [below](#using-thread).
+
+{% include admonition.html type="warning" title="Java only" content="The 
thread-based runtime can only be used with Pulsar Functions written in 
[Java](../api#java). If you choose the thread-based runtime, you won't be able 
to run non-Java functions." %}
+
+#### Docker runtime (coming soon) {#docker}
+
+A future release of Pulsar will feature a [Docker](https://docker.com)-based 
runtime that runs Pulsar Function instances in Docker containers, which 
facilitates using container orchestration platforms like 
[Kubernetes](https://kubernetes.io).
+### Configuration {#runtime-config}
+
+The following configurable parameters are available in the 
[`functions_worker.yml`](../../reference/Configuration#worker) configuration 
file for Pulsar {% popover brokers %}:
+
+{% include config.html id="functions_worker" %}
+
+#### Using the process-based runtime {#using-process}
+
+The process-based runtime for Pulsar Functions is the **default**. In the 
[`functions_worker.yaml`](#runtime-config) configuration file, you'll see this 
parameter present:
+
+```yaml
+processContainerFactory:
+  logDirectory:
+```
+
+Leave the `processContainerFactor` parameter in place if you'd like to use the 
process-based runtime. You can also specify a logging directory using the 
`logDirectory` parameter. Here's an example configuration for the process-based 
runtime:
+
+```yaml
+processContainerFactory:
+  logDirectory: /path/to/logging/dir
+```
+
+#### Using the thread-based runtime {#using-thread}
+
+In order to use the thread-based runtime for Pulsar Functions you'll need to 
remove the `processContainerFactory` parameter present by default in the 
`functions_worker.yml` [config file](#runtime-config) and replace it with a 
`threadContainerFactory` parameter as well as a `threadGroupName` 
sub-parameter. Here's an example:
+
+```yaml
+threadContainerFactory:
+  threadGroupName: "Thread Function Container Group"
+```
\ No newline at end of file
diff --git a/site/docs/latest/reference/Configuration.md 
b/site/docs/latest/reference/Configuration.md
index 7fcc58863a..d67efc8f79 100644
--- a/site/docs/latest/reference/Configuration.md
+++ b/site/docs/latest/reference/Configuration.md
@@ -30,6 +30,7 @@ Pulsar configuration can be managed either via a series of 
configuration files c
 * [Client](#client)
 * [Service discovery](#service-discovery)
 * [Configuration store](#configuration-store)
+* [Pulsar Functions worker](#pulsar-functions-worker)
 * [Log4j](#log4j)
 * [Log4j shell](#log4j-shell)
 * [Standalone](#standalone)
@@ -62,6 +63,12 @@ The [`pulsar-client`](../CliTools#pulsar-client) CLI tool 
can be used to publish
 
 {% include config.html id="configuration-store" %}
 
+## Pulsar Functions worker {#worker}
+
+Configuration for the [worker](../../functions/deployment#worker) process that 
drives [Pulsar Functions](../../functions/overview).
+
+{% include config.html id="functions_worker" %}
+
 ## Log4j
 
 {% include config.html id="log4j" %}


 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to