Jennifer88huang opened a new issue #6086: [docs] Add doc on how to run Pulsar Functions as pod in Kubernetes URL: https://github.com/apache/pulsar/issues/6086 **Describe the bug** Does anyone have clear instructions as to how run Pulsar Functions as pod in kubernetes? Addison Higham @roman yeah... the docs are lacking, we have it working. Let me grab our config :bananadance: 1 Addison Higham I assume you are running your broker on k8s? If so, then the only config you need in broker.conf is functionsWorkerEnabled=true, the rest of the config is all in the functions_worker.yml Addison Higham our copy of that looks like this: assignmentWriteMaxRetries: 60 clusterCoordinationTopicName: coordinate connectorsDirectory: ./connectors downloadDirectory: /tmp/pulsar_functions failureCheckFreqMs: 30000 functionAssignmentTopicName: assignments functionMetadataTopicName: metadata initialBrokerReconnectMaxRetries: 60 instanceLivenessCheckFreqMs: 30000 numFunctionPackageReplicas: 1 numHttpServerThreads: 8 secretsProviderConfiguratorClassName: org.apache.pulsar.functions.secretsproviderconfigurator.KubernetesSecretsProviderConfigurator kubernetesContainerFactory: jobNamespace: pulsar pulsarDockerImageName: instructure/pulsar-all:2.4.1-inst4 pulsarServiceUrl: pulsar+ssl://pulsar-beta-broker.pulsar:6651/ pulsarAdminUrl: https://pulsar-beta-broker.pulsar:8443/ submittingInsidePod: true percentMemoryPadding: 10 pulsarFunctionsCluster: pulsar-beta-iad pulsarFunctionsNamespace: public/functions-iad rescheduleTimeoutMs: 60000 schedulerClassName: org.apache.pulsar.functions.worker.scheduler.RoundRobinScheduler tlsCertRefreshCheckDurationSec: 300 useTls: true # this is the important one tokenPublicKey: file:///etc/pulsar/jwt/public.key topicCompactionFrequencySec: 1800 Addison Higham will highlight a few important config values that took a while to figure out Addison Higham secretsProviderConfiguratorClassName that one is important as it allows you to do secrets in your yaml for functions/io. Basically, it allows you to reference an k8s secret and inject it as an env var, like so: secrets: # this isn't the real password! this is a reference to a k8s secret that stores the real password MY_PASSWORD: path: "my-password" # the name of the k8s secret key: "password" # the key in that secret Addison Higham the kubernetesContainerFactory block is pretty straight forward, we just override the namespace where functions get run, as well as we use our own pulsar fork (all our stuff is upstreamed, just waiting for 2.5 to release). Technically, I am not sure you need to override the URLs if you aren't using TLS, but if you are using TLS you will want to make sure you specify the TLS endpoints. Addison Higham the pulsarFunctionsCluster and pulsarFunctionsNamespace are critical to overwrite if you have geo-replication. Each cluster will need it's own namepsace. Otherwise, each regional cluster will complain that it doesn't have permission to use the namespace, but if you add it so that the namespace is replicated, then each function worker in each region will try and pick up work from other regions, which is no good :stuck_out_tongue: Addison Higham if you are using TLS and want the functions worker to connect over TLS, you MUST set useTls it seems like it is a bug in the code as that property is deprecated but it works for now. Finally, the tokenPublicKey is needed if you are using token auth as the functions worker needs to be able to validate JWTs Mathieu Druart @Addison Higham are you using state API in your Functions ? Addison Higham nope, we tried in 2.4.x and were met with defeat, I think it is maturing a bit more with 2.5, will try it again once we get there Mathieu Druart we can't figure how to make persistance work in Functions Mathieu Druart ok thanks ! Addison Higham yeah, 2.5 takes a new version of bookkeeper which has some improvements, but I think most of it it was issues on the Pulsar side. It isn't really well documented yet Mathieu Druart We will try again with 2.5.0 too Mathieu Druart @Addison Higham @sijieg Hi ! We tried to use state API in Pulsar functions with the 2.5.0 version, but still no luck, only "State is not enabled." errors ... (deploying on Kubernetes with default Helm with extraServerComponents=org.apache.bookkeeper.stream.server.StreamStorageLifecycleComponent in Bookkeeper conf file). Any Idea ? Thanks ! sijieg @Mathieu Druart there is no much progress regarding state in Pulsar Functions in 2.5.0. We might be putting back the focus on this area for next major release (2.6.x releases). Mathieu Druart @sijieg ok, thanks for the answer sijieg @Addison Higham: do you have any ideas that we can improve the k8s runtime documentation here http://pulsar.apache.org/docs/en/functions-runtime/#configure-kubernetes-runtime ? Can you suggest a few? @Anonymitaet @Jennifer Huang can incorporate your comments into improving the documentation. pulsar.apache.orgpulsar.apache.org Configure Functions runtime · Apache Pulsar Pulsar Functions support the following methods to run functions. Sandeep Kotagiri @Addison Higham @sijieg I am extending this thread by some more discussion. And with some failures I observe in my environment when running functions with Kubernetes run time. I have configured kubernetes runtime in functions_worker.yml file. And I am able to launch a statefulset/pod to run the function. However, the function fails to run in my environment. In my case I am using TLS for Pulsar, and I am also using TLS Authentication. I have figured out how this is failing. Pod is starting with the following configuration as startup script. /pulsar/bin/pulsar-admin --admin-url https://172.16.77.84:8443 functions download --tenant public --namespace default --name firstfunction --destination-file /pulsar/api-examples.jar && SHARD_ID=${POD_NAME##*-} && echo shardId=${SHARD_ID} && exec java -cp /pulsar/instances/java-instance.jar:/pulsar/instances/deps/* -Dpulsar.functions.extra.dependencies.dir=/pulsar/instances/deps -Dpulsar.functions.instance.classpath=/pulsar/conf:::/pulsar/lib/*: -Dlog4j.configurationFile=kubernetes_instance_log4j2.xml -Dpulsar.function.log.dir=logs/functions/public/default/firstfunction -Dpulsar.function.log.file=firstfunction-$SHARD_ID -Xmx1073741824 org.apache.pulsar.functions.instance.JavaInstanceMain --jar /pulsar/api-examples.jar --instance_id $SHARD_ID --function_id 1f74c09f-e96d-4348-b35d-62bbd0d96fce --function_version 74abbe36-cd42-4e22-a404-ed27ee9602a6 --function_details '{"tenant":"public","namespace":"default","name":"firstfunction","className":"org.apache.pulsar.functions.api.examples.ExclamationFunction","autoAck":true,"parallelism":1,"source":{"typeClassName":"java.lang.String","inputSpecs":{"topicA":{}},"cleanupSubscription":true},"sink":{"topic":"persistent://public/default/topicAOut","typeClassName":"java.lang.String"},"resources":{"cpu":1.0,"ram":"1073741824","disk":"10737418240"},"componentType":"FUNCTION"}' --pulsar_serviceurl pulsar+ssl://172.16.77.84:6651 --use_tls true --tls_allow_insecure false --hostname_verification_enabled false --tls_trust_cert_path /pulsar/ssl/some_ca.crt --max_buffered_tuples 1024 --port 9093 --metrics_port 9094 --expected_healthcheck_interval -1 --secrets_provider org.apache.pulsar.functions.secretsprovider.EnvironmentBasedSecretsProvider --cluster_name pulsar-itomdipulsar. However, this is missing --client_auth_plugin and --client_auth_parameters parameters. When I intervene manually and set these parameters, the function seems to be running well. (edited) Sandeep Kotagiri I am adding my functions_worker.yml settings here. Sandeep Kotagiri assignmentWriteMaxRetries: 60 authenticationEnabled: false authenticationProviders: null authorizationEnabled: false authorizationProvider: org.apache.pulsar.broker.authorization.PulsarAuthorizationProvider clientAuthenticationParameters: tlsCertFile:/pulsar/server.crt,tlsKeyFile:/pulsar/server.key clientAuthenticationPlugin: org.apache.pulsar.client.impl.auth.AuthenticationTls clusterCoordinationTopicName: coordinate configurationStoreServers: localhost:2181 connectorsDirectory: ./connectors downloadDirectory: /tmp/pulsar_functions failureCheckFreqMs: 30000 functionAssignmentTopicName: assignments functionMetadataTopicName: metadata initialBrokerReconnectMaxRetries: 60 instanceLivenessCheckFreqMs: 30000 kubernetesContainerFactory: customLabels: null extraFunctionDependenciesDir: null imagePullPolicy: Always jobNamespace: sandeep k8Uri: null percentMemoryPadding: 10 pulsarAdminUrl: null pulsarDockerImageName: pulsar-image:latest pulsarRootDir: null pulsarServiceUrl: null submittingInsidePod: true numFunctionPackageReplicas: 1 numHttpServerThreads: 8 pulsarFunctionsCluster: pulsar-itomdipulsar pulsarFunctionsNamespace: public/functions pulsarServiceUrl: pulsar+ssl://localhost:6651 pulsarWebServiceUrl: https://localhost:8443 rescheduleTimeoutMs: 60000 schedulerClassName: org.apache.pulsar.functions.worker.scheduler.RoundRobinScheduler secretsProviderConfiguratorClassName: org.apache.pulsar.functions.secretsproviderconfigurator.KubernetesSecretsProviderConfigurator superUserRoles: null tlsAllowInsecureConnection: false tlsCertRefreshCheckDurationSec: 300 tlsCertificateFilePath: /var/run/secrets/boostport.com/server.crt tlsEnabled: true tlsKeyFilePath: /var/run/secrets/boostport.com/server.key tlsTrustCertsFilePath: /var/run/secrets/boostport.com/trustedCAs/RIC_ca.crt topicCompactionFrequencySec: 1800 useTls: 'true' workerHostname: localhost workerId: standalone workerPort: 6750 workerPortTls: 6751 zooKeeperOperationTimeoutSeconds: 30 zooKeeperSessionTimeoutMillis: 30000 (edited) Sandeep Kotagiri This is Pulsar 2.4.2. Sandeep Kotagiri I see that the org.apache.pulsar.functions.runtime.RuntimeUtils class is missing code that sets the client_auth_plugin and client_auth_parameters parmeters. Sandeep Kotagiri So is this a bug? Or am I supposed to utilize the secrets functionality via secretsProviderConfiguratorClassName in an appropriate manner. Atleast looking at the JavaInstanceStarter class seems to be telling me otherwise where RuntimeUtils class is missing these parameters. Roman For the documentation, it would definitely help to mention that the template files (https://github.com/apache/pulsar/blob/master/deployment/kubernetes/helm/pulsar/templates/broker-configmap.yaml#L41) should have the PF_ prefix for the configuration to pick up like so: PF_containerFactory: k8s PF_kubernetesContainerFactory_submittingInsidePod: "true" PF_kubernetesContainerFactory_percentMemoryPadding: "10" It’s also not self evident that k8s should be for the runtime enviroment to change (for script to pick it up as per https://github.com/apache/pulsar/blob/master/docker/pulsar/scripts/gen-yml-from-env.py#L81 deployment/kubernetes/helm/pulsar/templates/broker-configmap.yaml:41
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
