lhotari commented on code in PR #23793:
URL: https://github.com/apache/pulsar/pull/23793#discussion_r1901713225


##########
pip/pip-401.md:
##########
@@ -0,0 +1,93 @@
+# PIP-401: Support set batching configurations for Pulsar Functions&Sources
+
+# Background knowledge
+
+Pulsar Functions and Sources enable the batching feature hard-coded, and also 
set the `batchingMaxPublishDelay` to 10ms, it only
+supports set the `batch-builder` for now, this is not suitable for all the use 
cases, and also not feasible for users.
+
+# Motivation
+
+Support setting batching configurations for Pulsar Functions&Sources, to make 
it more flexible and suitable for users.
+
+# Goals
+
+## In Scope
+
+- Support setting batching configurations for Pulsar Functions&Sources.
+
+# High Level Design
+
+Make users able to enable&disable batching and set batching configurations for 
Pulsar Functions&Sources.
+
+# Detailed Design
+
+## Design & Implementation Details
+
+- Add a new message `BatchingSpec` with below fields in `Function.proto`, and 
add it as a new filed `batchingSpec` to the `ProducerSpec` message
+  - `bool enabled`
+  - `int32 batchingMaxPublishDelayMs`
+  - `int32 roundRobinRouterBatchingPartitionSwitchFrequency`
+  - `int32 batchingMaxMessages`
+  - `int32 batchingMaxBytes`
+  - `string batchBuilder`
+- Add a new class `BatchingConfig` with below fields and add it as a new field 
`batchingConfig` to the `ProducerConfig`:
+  - `bool enabled`
+  - `int batchingMaxPublishDelayMs`
+  - `int roundRobinRouterBatchingPartitionSwitchFrequency`
+  - `int batchingMaxMessages`
+  - `int batchingMaxBytes`
+  - `String batchBuilder`
+
+And related logic also will be added:
+- convert the `batchingSpec` field of the `ProducerSpec` from 
`FunctionDetails` to the `batchingConfig` field of the `ProducerConfig` and 
vice versa
+
+To keep the compatibility, when the `batchingSpec` of the `ProducerSpec` is 
null when creating the `ProducerConfig` from the `ProducerSpec`,
+the `batchingConfig` field will be fallback to: `BatchingConfig(enabled=true, 
batchingMaxPublishDelayMs=10)`.
+
+After the changes, users can pass the batching configurations when creating 
the functions and sources, like below:
+
+```shell
+./bin/pulsar-admin functions create \
+    --tenant public \
+    --namespace default \
+    --name test-java \
+    --className org.apache.pulsar.functions.api.examples.ExclamationFunction \
+    --inputs persistent://public/default/test-java-input \
+    --producer-config '{"batchingConfig": {"enabled": true, 
"batchingMaxPublishDelayMs": 100, 
"roundRobinRouterBatchingPartitionSwitchFrequency": 10, "batchingMaxMessages": 
1000}}' \
+    --jar /pulsar/examples/api-examples.jar
+```

Review Comment:
   Regarding the examples, Please share an example also for creating a 
`source`. Although it seems repetitive, it's useful to have concrete examples 
(https://lists.apache.org/thread/5cg98tpstjkp66t5zf7mtw1gdr0h1gj6). It would 
also be helpful to have Integration tests in the implementation to validate 
that the configuration also works end-to-end.
   
   Besides sharing a command line example, it would be useful to also share a 
yaml config file example. In cases where a function or source has more complex 
configuration, it's usually better to use a yaml config file that is passed 
with with `--function-config-file` or `--source-config-file` command line 
parameter.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to