Jennifer88huang opened a new issue #6086: [docs] Add doc on how to run Pulsar 
Functions as pod in Kubernetes
URL: https://github.com/apache/pulsar/issues/6086
 
 
   **Describe the bug**
   Does anyone have clear instructions as to how run Pulsar Functions as pod in 
kubernetes?
   
   Addison Higham  
   @roman yeah... the docs are lacking, we have it working. Let me grab our 
config
   :bananadance:
   1
   
   Addison Higham  
   I assume you are running your broker on k8s? If so, then the only config you 
need in broker.conf is functionsWorkerEnabled=true, the rest of the config is 
all in the functions_worker.yml
   
   Addison Higham  
   our copy of that looks like this:
   assignmentWriteMaxRetries: 60
   clusterCoordinationTopicName: coordinate
   connectorsDirectory: ./connectors
   downloadDirectory: /tmp/pulsar_functions
   failureCheckFreqMs: 30000
   functionAssignmentTopicName: assignments
   functionMetadataTopicName: metadata
   initialBrokerReconnectMaxRetries: 60
   instanceLivenessCheckFreqMs: 30000
   numFunctionPackageReplicas: 1
   numHttpServerThreads: 8
   secretsProviderConfiguratorClassName: 
org.apache.pulsar.functions.secretsproviderconfigurator.KubernetesSecretsProviderConfigurator
   kubernetesContainerFactory:
     jobNamespace: pulsar
     pulsarDockerImageName: instructure/pulsar-all:2.4.1-inst4
     pulsarServiceUrl: pulsar+ssl://pulsar-beta-broker.pulsar:6651/
     pulsarAdminUrl: https://pulsar-beta-broker.pulsar:8443/
     submittingInsidePod: true
     percentMemoryPadding: 10
   pulsarFunctionsCluster: pulsar-beta-iad
   pulsarFunctionsNamespace: public/functions-iad
   rescheduleTimeoutMs: 60000
   schedulerClassName: 
org.apache.pulsar.functions.worker.scheduler.RoundRobinScheduler
   tlsCertRefreshCheckDurationSec: 300
   useTls: true # this is the important one
   tokenPublicKey: file:///etc/pulsar/jwt/public.key
   topicCompactionFrequencySec: 1800
   
   Addison Higham  
   will highlight a few important config values that took a while to figure out
   
   Addison Higham  
   secretsProviderConfiguratorClassName that one is important as it allows you 
to do secrets in your yaml for functions/io. Basically, it allows you to 
reference an k8s secret and inject it as an env var, like so:
   secrets:
     # this isn't the real password! this is a reference to a k8s secret that 
stores the real password
     MY_PASSWORD:
       path: "my-password" # the name of the k8s secret
       key: "password" # the key in that secret
   
   Addison Higham  
   the kubernetesContainerFactory block is pretty straight forward, we just 
override the namespace where functions get run, as well as we use our own 
pulsar fork (all our stuff is upstreamed, just waiting for 2.5 to release). 
Technically, I am not sure you need to override the URLs if you aren't using 
TLS, but if you are using TLS you will want to make sure you specify the TLS 
endpoints.
   
   Addison Higham  
   the pulsarFunctionsCluster and pulsarFunctionsNamespace are critical to 
overwrite if you have geo-replication. Each cluster will need it's own 
namepsace. Otherwise, each regional cluster will complain that it doesn't have 
permission to use the namespace, but if you add it so that the namespace is 
replicated, then each function worker in each region will try and pick up work 
from other regions, which is no good :stuck_out_tongue:
   
   Addison Higham  
   if you are using TLS and want the functions worker to connect over TLS, you 
MUST set useTls it seems like it is a bug in the code as that property is 
deprecated but it works for now. Finally, the tokenPublicKey is needed if you 
are using token auth as the functions worker needs to be able to validate JWTs
   
   Mathieu Druart  
   @Addison Higham are you using state API in your Functions ?
   
   Addison Higham  
   nope, we tried in 2.4.x and were met with defeat, I think it is maturing a 
bit more with 2.5, will try it again once we get there
   
   Mathieu Druart  
   we can't figure how to make persistance work in Functions
   
   Mathieu Druart  
   ok thanks !
   
   Addison Higham  
   yeah, 2.5 takes a new version of bookkeeper which has some improvements, but 
I think most of it it was issues on the Pulsar side. It isn't really well 
documented yet
   
   Mathieu Druart  
   We will try again with 2.5.0 too
   
   Mathieu Druart  
   @Addison Higham @sijieg Hi ! We tried to use state API in Pulsar functions 
with the 2.5.0 version, but still no luck, only "State is not enabled." errors 
...  (deploying on Kubernetes with default Helm with 
extraServerComponents=org.apache.bookkeeper.stream.server.StreamStorageLifecycleComponent
 in Bookkeeper conf file). Any Idea ? Thanks !
   
   sijieg  
   @Mathieu Druart there is no much progress regarding state in Pulsar 
Functions in 2.5.0. We might be putting back the focus on this area for next 
major release (2.6.x releases).
   
   Mathieu Druart  
   @sijieg ok, thanks for the answer
   
   sijieg  
   @Addison Higham: do you have any ideas that we can improve the k8s runtime 
documentation here 
http://pulsar.apache.org/docs/en/functions-runtime/#configure-kubernetes-runtime
 ? Can you suggest a few?  @Anonymitaet @Jennifer Huang can incorporate your 
comments into improving the documentation.
   pulsar.apache.orgpulsar.apache.org
   Configure Functions runtime · Apache Pulsar
   Pulsar Functions support the following methods to run functions.
   
   Sandeep Kotagiri  
   @Addison Higham @sijieg I am extending this thread by some more discussion. 
And with some failures I observe in my environment when running functions with 
Kubernetes run time. I have configured kubernetes runtime in 
functions_worker.yml file. And I am able to launch a statefulset/pod to run the 
function. However, the function fails to run in my environment. In my case I am 
using TLS for Pulsar, and I am also using TLS Authentication. I have figured 
out how this is failing. Pod is starting with the following configuration as 
startup script. /pulsar/bin/pulsar-admin --admin-url https://172.16.77.84:8443 
functions download --tenant public --namespace default --name firstfunction 
--destination-file /pulsar/api-examples.jar && SHARD_ID=${POD_NAME##*-} && echo 
shardId=${SHARD_ID} && exec java -cp 
/pulsar/instances/java-instance.jar:/pulsar/instances/deps/* 
-Dpulsar.functions.extra.dependencies.dir=/pulsar/instances/deps 
-Dpulsar.functions.instance.classpath=/pulsar/conf:::/pulsar/lib/*: 
-Dlog4j.configurationFile=kubernetes_instance_log4j2.xml 
-Dpulsar.function.log.dir=logs/functions/public/default/firstfunction 
-Dpulsar.function.log.file=firstfunction-$SHARD_ID -Xmx1073741824 
org.apache.pulsar.functions.instance.JavaInstanceMain --jar 
/pulsar/api-examples.jar --instance_id $SHARD_ID --function_id 
1f74c09f-e96d-4348-b35d-62bbd0d96fce --function_version 
74abbe36-cd42-4e22-a404-ed27ee9602a6 --function_details 
'{"tenant":"public","namespace":"default","name":"firstfunction","className":"org.apache.pulsar.functions.api.examples.ExclamationFunction","autoAck":true,"parallelism":1,"source":{"typeClassName":"java.lang.String","inputSpecs":{"topicA":{}},"cleanupSubscription":true},"sink":{"topic":"persistent://public/default/topicAOut","typeClassName":"java.lang.String"},"resources":{"cpu":1.0,"ram":"1073741824","disk":"10737418240"},"componentType":"FUNCTION"}'
 --pulsar_serviceurl pulsar+ssl://172.16.77.84:6651 --use_tls true 
--tls_allow_insecure false --hostname_verification_enabled false 
--tls_trust_cert_path /pulsar/ssl/some_ca.crt --max_buffered_tuples 1024 --port 
9093 --metrics_port 9094 --expected_healthcheck_interval -1 --secrets_provider 
org.apache.pulsar.functions.secretsprovider.EnvironmentBasedSecretsProvider 
--cluster_name pulsar-itomdipulsar. However, this is missing  
--client_auth_plugin and --client_auth_parameters parameters. When I intervene 
manually and set these parameters, the function seems to be running well. 
(edited) 
   
   Sandeep Kotagiri  
   I am adding my functions_worker.yml settings here.
   
   Sandeep Kotagiri  
   assignmentWriteMaxRetries: 60
   authenticationEnabled: false
   authenticationProviders: null
   authorizationEnabled: false
   authorizationProvider: 
org.apache.pulsar.broker.authorization.PulsarAuthorizationProvider
   clientAuthenticationParameters: 
tlsCertFile:/pulsar/server.crt,tlsKeyFile:/pulsar/server.key
   clientAuthenticationPlugin: 
org.apache.pulsar.client.impl.auth.AuthenticationTls
   clusterCoordinationTopicName: coordinate
   configurationStoreServers: localhost:2181
   connectorsDirectory: ./connectors
   downloadDirectory: /tmp/pulsar_functions
   failureCheckFreqMs: 30000
   functionAssignmentTopicName: assignments
   functionMetadataTopicName: metadata
   initialBrokerReconnectMaxRetries: 60
   instanceLivenessCheckFreqMs: 30000
   kubernetesContainerFactory:
     customLabels: null
     extraFunctionDependenciesDir: null
     imagePullPolicy: Always
     jobNamespace: sandeep
     k8Uri: null
     percentMemoryPadding: 10
     pulsarAdminUrl: null
     pulsarDockerImageName: pulsar-image:latest
     pulsarRootDir: null
     pulsarServiceUrl: null
     submittingInsidePod: true
   numFunctionPackageReplicas: 1
   numHttpServerThreads: 8
   pulsarFunctionsCluster: pulsar-itomdipulsar
   pulsarFunctionsNamespace: public/functions
   pulsarServiceUrl: pulsar+ssl://localhost:6651
   pulsarWebServiceUrl: https://localhost:8443
   rescheduleTimeoutMs: 60000
   schedulerClassName: 
org.apache.pulsar.functions.worker.scheduler.RoundRobinScheduler
   secretsProviderConfiguratorClassName: 
org.apache.pulsar.functions.secretsproviderconfigurator.KubernetesSecretsProviderConfigurator
   superUserRoles: null
   tlsAllowInsecureConnection: false
   tlsCertRefreshCheckDurationSec: 300
   tlsCertificateFilePath: /var/run/secrets/boostport.com/server.crt
   tlsEnabled: true
   tlsKeyFilePath: /var/run/secrets/boostport.com/server.key
   tlsTrustCertsFilePath: /var/run/secrets/boostport.com/trustedCAs/RIC_ca.crt
   topicCompactionFrequencySec: 1800
   useTls: 'true'
   workerHostname: localhost
   workerId: standalone
   workerPort: 6750
   workerPortTls: 6751
   zooKeeperOperationTimeoutSeconds: 30
   zooKeeperSessionTimeoutMillis: 30000
    (edited) 
   
   Sandeep Kotagiri  
   This is Pulsar 2.4.2.
   
   Sandeep Kotagiri  
   I see that the  org.apache.pulsar.functions.runtime.RuntimeUtils class is 
missing code that sets the client_auth_plugin and client_auth_parameters 
parmeters.
   
   Sandeep Kotagiri  
   So is this a bug? Or am I supposed to utilize the secrets functionality via 
secretsProviderConfiguratorClassName in an appropriate manner. Atleast looking 
at the JavaInstanceStarter class seems to be telling me otherwise where 
RuntimeUtils class is missing these parameters.
   
   Roman 
   For the documentation, it would definitely help to mention that the template 
files 
(https://github.com/apache/pulsar/blob/master/deployment/kubernetes/helm/pulsar/templates/broker-configmap.yaml#L41)
 should have the PF_  prefix for the configuration to pick up like so:
    PF_containerFactory: k8s
    PF_kubernetesContainerFactory_submittingInsidePod: "true"
    PF_kubernetesContainerFactory_percentMemoryPadding: "10"
   It’s also not self evident that k8s should be for the runtime enviroment to 
change (for script to pick it up as per 
https://github.com/apache/pulsar/blob/master/docker/pulsar/scripts/gen-yml-from-env.py#L81
   deployment/kubernetes/helm/pulsar/templates/broker-configmap.yaml:41
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to