candlerb opened a new issue #5521: Python exception message not shown when python function fails to start URL: https://github.com/apache/pulsar/issues/5521 **Describe the bug** I have a problem starting a python function. However, the actual python error message (reported on stderr) is being lost. I can only see it if I "strace" the entire pulsar stack. **To Reproduce** I am using pulsar functions with pulsar 2.4.1 in standalone mode, base OS is Ubuntu 18.04, and I've done the frig to link "python" to "python3": ``` sudo update-alternatives --install /usr/bin/python python /usr/bin/python3 10 ``` Here's my function: ``` $ cat func1.py from pulsar import Function class FirstFunction(Function): def process(self, item, context): log = context.get_logger() log.info("Got %r with properties %r" % (item, context.get_message_properties())) ``` I can run it from the command line and it just returns without error. Deployed: ``` $ apache-pulsar-2.4.1/bin/pulsar-admin functions create --name womble --inputs my-topic --py /home/ubuntu/func1.py --classname FirstFunction --log-topic pulsar-log $ apache-pulsar-2.4.1/bin/pulsar-admin functions status --name womble { "numInstances" : 1, "numRunning" : 0, "instances" : [ { "instanceId" : 0, "status" : { "running" : false, "error" : "", "numRestarts" : 222, "numReceived" : 0, "numSuccessfullyProcessed" : 0, "numUserExceptions" : 0, "latestUserExceptions" : [ ], "numSystemExceptions" : 0, "latestSystemExceptions" : [ ], "averageLatency" : 0.0, "lastInvocationTime" : 0, "workerId" : "c-standalone-fw-ldex-pulsar.int.example.net-8080" } } ] } ``` Looking at the pulsar process output, I see it restarting every 30 seconds but no indication of why it's dying. There's a "DeathException" but no further info - not even the process exit code as fas as I can see. ``` 12:39:45.003 [function-timer-thread-94-1] ERROR org.apache.pulsar.functions.runtime.RuntimeSpawner - public/default/womble-java.lang.RuntimeException: Function Container is dead with exception.. restarting 12:39:45.003 [function-timer-thread-94-1] INFO org.apache.pulsar.functions.runtime.ProcessRuntime - Creating function log directory /home/ubuntu/apache-pulsar-2 .4.1/logs/functions/public/default/womble 12:39:45.003 [function-timer-thread-94-1] INFO org.apache.pulsar.functions.runtime.ProcessRuntime - Created or found function log directory /home/ubuntu/apache- pulsar-2.4.1/logs/functions/public/default/womble 12:39:45.003 [function-timer-thread-94-1] INFO org.apache.pulsar.functions.runtime.ProcessRuntime - ProcessBuilder starting the process with args python /home/u buntu/apache-pulsar-2.4.1/instances/python-instance/python_instance_main.py --py /tmp/pulsar_functions/public/default/womble/0/func1.py --logging_directory /home /ubuntu/apache-pulsar-2.4.1/logs/functions --logging_file womble --logging_config_file /home/ubuntu/apache-pulsar-2.4.1/conf/functions-logging/logging_config.ini --instance_id 0 --function_id 67fb9828-56ab-49ff-bbbc-56db9510d26c --function_version 4f01d696-28da-40b7-b9f5-2283fbe9bd7c --function_details '{"tenant":"public ","namespace":"default","name":"womble","className":"FirstFunction","logTopic":"pulsar-log","runtime":"PYTHON","autoAck":true,"parallelism":1,"source":{"inputSpe cs":{"my-topic":{}},"cleanupSubscription":true},"sink":{},"resources":{"cpu":1.0,"ram":"1073741824","disk":"10737418240"},"componentType":"FUNCTION"}' --pulsar_s erviceurl pulsar://ldex-pulsar.int.example.net:6650 --max_buffered_tuples 1024 --port 38423 --metrics_port 36591 --state_storage_serviceurl bk://127.0.0.1:418 1 --expected_healthcheck_interval 30 --secrets_provider secretsprovider.ClearTextSecretsProvider --cluster_name standalone 12:39:45.005 [function-timer-thread-94-1] INFO org.apache.pulsar.functions.runtime.ProcessRuntime - Started process successfully 2019-10-31 12:39:45.478 INFO ConnectionPool:72 | Created connection for pulsar://ldex-pulsar.int.example.net:6650 2019-10-31 12:39:45.480 INFO ClientConnection:324 | [10.85.179.103:37476 -> 10.85.179.103:6650] Connected to broker 12:39:45.480 [pulsar-io-50-6] INFO org.apache.pulsar.broker.service.ServerCnx - New connection from /10.85.179.103:37476 2019-10-31 12:39:45.481 INFO BatchMessageContainer:43 | { BatchContainer [size = 0] [batchSizeInBytes_ = 0] [maxAllowedMessageBatchSizeInBytes_ = 131072] [maxAl lowedNumMessagesInBatch_ = 1000] [topicName = persistent://public/default/pulsar-log] [producerName_ = ] [batchSizeInBytes_ = 0] [numberOfBatchesSent = 0] [avera geBatchSize = 0]} BatchMessageContainer constructed 2019-10-31 12:39:45.481 INFO HandlerBase:52 | [persistent://public/default/pulsar-log, ] Getting connection from pool 12:39:45.482 [pulsar-io-50-6] INFO org.apache.pulsar.broker.service.ServerCnx - [/10.85.179.103:37476][persistent://public/default/pulsar-log] Creating producer . producerId=0 12:39:45.483 [ForkJoinPool.commonPool-worker-1] INFO org.apache.pulsar.broker.service.ServerCnx - [/10.85.179.103:37476]-0 persistent://public/default/pulsar-lo g configured with schema false 12:39:45.483 [ForkJoinPool.commonPool-worker-1] INFO org.apache.pulsar.broker.service.ServerCnx - [/10.85.179.103:37476] Created new producer: Producer{topic=Pe rsistentTopic{topic=persistent://public/default/pulsar-log}, client=/10.85.179.103:37476, producerName=standalone-1-142, producerId=0} 2019-10-31 12:39:45.483 INFO ProducerImpl:155 | [persistent://public/default/pulsar-log, ] Created producer on broker [10.85.179.103:37476 -> 10.85.179.103:6650 ] 2019-10-31 12:39:45.504 INFO Client:88 | Subscribing on Topic :my-topic 2019-10-31 12:39:45.505 INFO HandlerBase:52 | [persistent://public/default/my-topic, public/default/womble, 0] Getting connection from pool 12:39:45.505 [pulsar-io-50-6] INFO org.apache.pulsar.broker.service.ServerCnx - [/10.85.179.103:37476] Subscribing on topic persistent://public/default/my-topic / public/default/womble 12:39:45.505 [pulsar-io-50-6] INFO org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - [public/default/persistent/my-topic-public%2Fdefault%2Fwomble] Rewind from 36531:29 to 36531:9 12:39:45.506 [pulsar-io-50-6] INFO org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://public/default/my-topic] There are no replicated subscriptions on the topic 12:39:45.506 [pulsar-io-50-6] INFO org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://public/default/my-topic][public/default/womble] C reated new subscription for 0 12:39:45.506 [pulsar-io-50-6] INFO org.apache.pulsar.broker.service.ServerCnx - [/10.85.179.103:37476] Created subscription on topic persistent://public/default /my-topic / public/default/womble 2019-10-31 12:39:45.506 INFO ConsumerImpl:170 | [persistent://public/default/my-topic, public/default/womble, 0] Created consumer on broker [10.85.179.103:37476 -> 10.85.179.103:6650] 12:39:45.598 [pulsar-io-50-6] INFO org.apache.pulsar.broker.service.ServerCnx - Closed connection from /10.85.179.103:37476 12:39:45.598 [pulsar-io-50-6] INFO org.apache.pulsar.broker.service.persistent.PersistentDispatcherMultipleConsumers - Removed consumer Consumer{subscription=Pe rsistentSubscription{topic=persistent://public/default/my-topic, name=public/default/womble}, consumerId=0, consumerName=f01655, address=/10.85.179.103:37476} with pending 20 acks 12:40:00.754 [pulsar-web-57-5] INFO org.eclipse.jetty.server.RequestLog - 10.85.179.103 - - [31/Oct/2019:12:40:00 +0000] "GET /admin/v2/persistent/public/functi ons/coordinate/stats HTTP/1.1" 200 851 "-" "Pulsar-Java-v2.4.1" 2 12:40:15.002 [function-timer-thread-94-1] ERROR org.apache.pulsar.functions.runtime.ProcessRuntime - Extracted Process death exception java.lang.RuntimeException: at org.apache.pulsar.functions.runtime.ProcessRuntime.tryExtractingDeathException(ProcessRuntime.java:380) ~[org.apache.pulsar-pulsar-functions-runtime-2.4.1.jar:2.4.1] at org.apache.pulsar.functions.runtime.ProcessRuntime.isAlive(ProcessRuntime.java:367) ~[org.apache.pulsar-pulsar-functions-runtime-2.4.1.jar:2.4.1] at org.apache.pulsar.functions.runtime.RuntimeSpawner.lambda$start$0(RuntimeSpawner.java:88) ~[org.apache.pulsar-pulsar-functions-runtime-2.4.1.jar:2.4.1] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_222] at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [?:1.8.0_222] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_222] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [?:1.8.0_222] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_222] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_222] at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_222] 12:40:15.003 [function-timer-thread-94-1] ERROR org.apache.pulsar.functions.runtime.RuntimeSpawner - public/default/womble-java.lang.RuntimeException: Function Container is dead with exception.. restarting 12:40:15.003 [function-timer-thread-94-1] INFO org.apache.pulsar.functions.runtime.ProcessRuntime - Creating function log directory /home/ubuntu/apache-pulsar-2.4.1/logs/functions/public/default/womble 12:40:15.003 [function-timer-thread-94-1] INFO org.apache.pulsar.functions.runtime.ProcessRuntime - Created or found function log directory /home/ubuntu/apache-pulsar-2.4.1/logs/functions/public/default/womble 12:40:15.004 [function-timer-thread-94-1] INFO org.apache.pulsar.functions.runtime.ProcessRuntime - ProcessBuilder starting the process with args python /home/ubuntu/apache-pulsar-2.4.1/instances/python-instance/python_instance_main.py --py /tmp/pulsar_functions/public/default/womble/0/func1.py --logging_directory /home/ubuntu/apache-pulsar-2.4.1/logs/functions --logging_file womble --logging_config_file /home/ubuntu/apache-pulsar-2.4.1/conf/functions-logging/logging_config.ini --instance_id 0 --function_id 67fb9828-56ab-49ff-bbbc-56db9510d26c --function_version 4f01d696-28da-40b7-b9f5-2283fbe9bd7c --function_details '{"tenant":"public","namespace":"default","name":"womble","className":"FirstFunction","logTopic":"pulsar-log","runtime":"PYTHON","autoAck":true,"parallelism":1,"source":{"inputSpecs":{"my-topic":{}},"cleanupSubscription":true},"sink":{},"resources":{"cpu":1.0,"ram":"1073741824","disk":"10737418240"},"componentType":"FUNCTION"}' --pulsar_serviceurl pulsar://ldex-pulsar.int.example.net:6650 --max_buffered_tuples 1024 --port 38423 --metrics_port 36591 --state_storage_serviceurl bk://127.0.0.1:4181 --expected_healthcheck_interval 30 --secrets_provider secretsprovider.ClearTextSecretsProvider --cluster_name standalone 12:40:15.010 [function-timer-thread-94-1] INFO org.apache.pulsar.functions.runtime.ProcessRuntime - Started process successfully 2019-10-31 12:40:15.481 INFO ConnectionPool:72 | Created connection for pulsar://ldex-pulsar.int.example.net:6650 2019-10-31 12:40:15.483 INFO ClientConnection:324 | [10.85.179.103:37928 -> 10.85.179.103:6650] Connected to broker 12:40:15.484 [pulsar-io-50-7] INFO org.apache.pulsar.broker.service.ServerCnx - New connection from /10.85.179.103:37928 2019-10-31 12:40:15.485 INFO BatchMessageContainer:43 | { BatchContainer [size = 0] [batchSizeInBytes_ = 0] [maxAllowedMessageBatchSizeInBytes_ = 131072] [maxAllowedNumMessagesInBatch_ = 1000] [topicName = persistent://public/default/pulsar-log] [producerName_ = ] [batchSizeInBytes_ = 0] [numberOfBatchesSent = 0] [averageBatchSize = 0]} BatchMessageContainer constructed 2019-10-31 12:40:15.485 INFO HandlerBase:52 | [persistent://public/default/pulsar-log, ] Getting connection from pool 12:40:15.487 [pulsar-io-50-7] INFO org.apache.pulsar.broker.service.ServerCnx - [/10.85.179.103:37928][persistent://public/default/pulsar-log] Creating producer. producerId=0 12:40:15.487 [ForkJoinPool.commonPool-worker-2] INFO org.apache.pulsar.broker.service.ServerCnx - [/10.85.179.103:37928]-0 persistent://public/default/pulsar-log configured with schema false 12:40:15.488 [ForkJoinPool.commonPool-worker-2] INFO org.apache.pulsar.broker.service.ServerCnx - [/10.85.179.103:37928] Created new producer: Producer{topic=PersistentTopic{topic=persistent://public/default/pulsar-log}, client=/10.85.179.103:37928, producerName=standalone-1-143, producerId=0} 2019-10-31 12:40:15.488 INFO ProducerImpl:155 | [persistent://public/default/pulsar-log, ] Created producer on broker [10.85.179.103:37928 -> 10.85.179.103:6650] 2019-10-31 12:40:15.521 INFO Client:88 | Subscribing on Topic :my-topic 2019-10-31 12:40:15.522 INFO HandlerBase:52 | [persistent://public/default/my-topic, public/default/womble, 0] Getting connection from pool 12:40:15.523 [pulsar-io-50-7] INFO org.apache.pulsar.broker.service.ServerCnx - [/10.85.179.103:37928] Subscribing on topic persistent://public/default/my-topic / public/default/womble 12:40:15.524 [pulsar-io-50-7] INFO org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - [public/default/persistent/my-topic-public%2Fdefault%2Fwomble] Rewind from 36531:29 to 36531:9 12:40:15.524 [pulsar-io-50-7] INFO org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://public/default/my-topic] There are no replicated subscriptions on the topic 12:40:15.524 [pulsar-io-50-7] INFO org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://public/default/my-topic][public/default/womble] Created new subscription for 0 12:40:15.524 [pulsar-io-50-7] INFO org.apache.pulsar.broker.service.ServerCnx - [/10.85.179.103:37928] Created subscription on topic persistent://public/default/my-topic / public/default/womble 2019-10-31 12:40:15.524 INFO ConsumerImpl:170 | [persistent://public/default/my-topic, public/default/womble, 0] Created consumer on broker [10.85.179.103:37928 -> 10.85.179.103:6650] 12:40:15.632 [pulsar-io-50-7] INFO org.apache.pulsar.broker.service.ServerCnx - Closed connection from /10.85.179.103:37928 12:40:15.633 [pulsar-io-50-7] INFO org.apache.pulsar.broker.service.persistent.PersistentDispatcherMultipleConsumers - Removed consumer Consumer{subscription=PersistentSubscription{topic=persistent://public/default/my-topic, name=public/default/womble}, consumerId=0, consumerName=b4b94f, address=/10.85.179.103:37928} with pending 20 acks ``` And so on. Using localrun mode isn't any better: ``` $ apache-pulsar-2.4.1/bin/pulsar-admin functions localrun --name womble2 --inputs my-topic --py /home/ubuntu/func1.py --classname FirstFunction --log-topic pulsar-log 12:44:32.943 [main] INFO org.apache.pulsar.functions.runtime.ProcessRuntimeFactory - Java instance jar location is not defined, using the location defined in system environment : /home/ubuntu/apache-pulsar-2.4.1/instances/java-instance.jar 12:44:32.947 [main] INFO org.apache.pulsar.functions.runtime.ProcessRuntimeFactory - Python instance file location is not defined using the location defined in system environment : /home/ubuntu/apache-pulsar-2.4.1/instances/python-instance/python_instance_main.py 12:44:32.948 [main] INFO org.apache.pulsar.functions.runtime.ProcessRuntimeFactory - No extra dependencies location is defined in either function worker config or system environment 12:44:33.085 [main] INFO org.apache.pulsar.functions.runtime.RuntimeSpawner - public/default/womble2-0 RuntimeSpawner starting function 12:44:33.086 [main] INFO org.apache.pulsar.functions.runtime.ProcessRuntime - Creating function log directory /home/ubuntu/apache-pulsar-2.4.1/logs/functions/public/default/womble2 12:44:33.088 [main] INFO org.apache.pulsar.functions.runtime.ProcessRuntime - Created or found function log directory /home/ubuntu/apache-pulsar-2.4.1/logs/functions/public/default/womble2 12:44:33.088 [main] INFO org.apache.pulsar.functions.runtime.ProcessRuntime - ProcessBuilder starting the process with args python /home/ubuntu/apache-pulsar-2.4.1/instances/python-instance/python_instance_main.py --py /home/ubuntu/func1.py --logging_directory /home/ubuntu/apache-pulsar-2.4.1/logs/functions --logging_file womble2 --logging_config_file /home/ubuntu/apache-pulsar-2.4.1/conf/functions-logging/logging_config.ini --instance_id 0 --function_id a776f9d5-db5e-4b59-9b01-ab7661ea0df2 --function_version 48bb96e4-69a9-43f9-b880-7ae89d41b0e3 --function_details '{"tenant":"public","namespace":"default","name":"womble2","className":"FirstFunction","logTopic":"pulsar-log","runtime":"PYTHON","autoAck":true,"parallelism":1,"source":{"inputSpecs":{"my-topic":{}},"cleanupSubscription":true},"sink":{},"resources":{"cpu":1.0,"ram":"1073741824","disk":"10737418240"},"componentType":"FUNCTION"}' --pulsar_serviceurl pulsar://localhost:6650 --use_tls false --tls_allow_insecure false --hostname_verification_enabled false --max_buffered_tuples 1024 --port 46365 --metrics_port 44849 --expected_healthcheck_interval 30 --secrets_provider secretsprovider.ClearTextSecretsProvider --cluster_name local 12:44:33.104 [main] INFO org.apache.pulsar.functions.runtime.ProcessRuntime - Started process successfully 2019-10-31 12:44:33.546 INFO ConnectionPool:72 | Created connection for pulsar://localhost:6650 2019-10-31 12:44:33.547 INFO ClientConnection:324 | [127.0.0.1:52338 -> 127.0.0.1:6650] Connected to broker 2019-10-31 12:44:33.549 INFO BatchMessageContainer:43 | { BatchContainer [size = 0] [batchSizeInBytes_ = 0] [maxAllowedMessageBatchSizeInBytes_ = 131072] [maxAllowedNumMessagesInBatch_ = 1000] [topicName = persistent://public/default/pulsar-log] [producerName_ = ] [batchSizeInBytes_ = 0] [numberOfBatchesSent = 0] [averageBatchSize = 0]} BatchMessageContainer constructed 2019-10-31 12:44:33.549 INFO HandlerBase:52 | [persistent://public/default/pulsar-log, ] Getting connection from pool 2019-10-31 12:44:33.550 INFO ConnectionPool:72 | Created connection for pulsar://ldex-pulsar.int.example.net:6650 2019-10-31 12:44:33.550 INFO ClientConnection:326 | [127.0.0.1:52340 -> 127.0.0.1:6650] Connected to broker through proxy. Logical broker: pulsar://ldex-pulsar.int.example.net:6650 2019-10-31 12:44:33.551 INFO ProducerImpl:155 | [persistent://public/default/pulsar-log, ] Created producer on broker [127.0.0.1:52340 -> 127.0.0.1:6650] 2019-10-31 12:44:33.553 INFO Client:88 | Subscribing on Topic :my-topic 2019-10-31 12:44:33.554 INFO HandlerBase:52 | [persistent://public/default/my-topic, public/default/womble2, 0] Getting connection from pool 2019-10-31 12:44:33.626 INFO ConsumerImpl:170 | [persistent://public/default/my-topic, public/default/womble2, 0] Created consumer on broker [127.0.0.1:52340 -> 127.0.0.1:6650] 12:44:33.703 [main] INFO org.apache.pulsar.functions.LocalRunner - RuntimeSpawner quit because of << hangs for 30 seconds here - right after "because of" !! >> 12:45:03.838 [function-timer-thread-3-1] ERROR org.apache.pulsar.functions.runtime.ProcessRuntime - Health check failed for womble2-0 java.util.concurrent.ExecutionException: io.grpc.StatusRuntimeException: UNAVAILABLE: io exception at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) ~[?:1.8.0_222] at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895) ~[?:1.8.0_222] at org.apache.pulsar.functions.runtime.ProcessRuntime.lambda$start$1(ProcessRuntime.java:164) ~[org.apache.pulsar-pulsar-functions-runtime-2.4.1.jar:2.4.1] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_222] at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [?:1.8.0_222] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_222] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [?:1.8.0_222] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_222] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_222] at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_222] Caused by: io.grpc.StatusRuntimeException: UNAVAILABLE: io exception at io.grpc.Status.asRuntimeException(Status.java:530) ~[io.grpc-grpc-core-1.18.0.jar:1.18.0] at io.grpc.stub.ClientCalls$UnaryStreamToFuture.onClose(ClientCalls.java:482) ~[io.grpc-grpc-stub-1.18.0.jar:1.18.0] at io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39) ~[io.grpc-grpc-core-1.18.0.jar:1.18.0] at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23) ~[io.grpc-grpc-core-1.18.0.jar:1.18.0] at io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40) ~[io.grpc-grpc-core-1.18.0.jar:1.18.0] at io.grpc.internal.CensusStatsModule$StatsClientInterceptor$1$1.onClose(CensusStatsModule.java:699) ~[io.grpc-grpc-core-1.18.0.jar:1.18.0] at io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39) ~[io.grpc-grpc-core-1.18.0.jar:1.18.0] at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23) ~[io.grpc-grpc-core-1.18.0.jar:1.18.0] at io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40) ~[io.grpc-grpc-core-1.18.0.jar:1.18.0] at io.grpc.internal.CensusTracingModule$TracingClientInterceptor$1$1.onClose(CensusTracingModule.java:397) ~[io.grpc-grpc-core-1.18.0.jar:1.18.0] at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:459) ~[io.grpc-grpc-core-1.18.0.jar:1.18.0] at io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:63) ~[io.grpc-grpc-core-1.18.0.jar:1.18.0] at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.close(ClientCallImpl.java:546) ~[io.grpc-grpc-core-1.18.0.jar:1.18.0] at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.access$600(ClientCallImpl.java:467) ~[io.grpc-grpc-core-1.18.0.jar:1.18.0] at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:584) ~[io.grpc-grpc-core-1.18.0.jar:1.18.0] at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) ~[io.grpc-grpc-core-1.18.0.jar:1.18.0] at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123) ~[io.grpc-grpc-core-1.18.0.jar:1.18.0] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_222] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_222] ... 1 more Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: /127.0.0.1:46365 at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:1.8.0_222] at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) ~[?:1.8.0_222] at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:327) ~[io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340) ~[io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:644) ~[io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:591) ~[io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:508) ~[io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:470) ~[io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final] at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:909) ~[io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final] ... 2 more Caused by: java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:1.8.0_222] at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) ~[?:1.8.0_222] at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:327) ~[io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340) ~[io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:644) ~[io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:591) ~[io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:508) ~[io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:470) ~[io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final] at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:909) ~[io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final] ... 2 more 12:45:03.864 [function-timer-thread-3-1] ERROR org.apache.pulsar.functions.runtime.ProcessRuntime - Extracted Process death exception java.lang.RuntimeException: at org.apache.pulsar.functions.runtime.ProcessRuntime.tryExtractingDeathException(ProcessRuntime.java:380) ~[org.apache.pulsar-pulsar-functions-runtime-2.4.1.jar:2.4.1] at org.apache.pulsar.functions.runtime.ProcessRuntime.isAlive(ProcessRuntime.java:367) ~[org.apache.pulsar-pulsar-functions-runtime-2.4.1.jar:2.4.1] at org.apache.pulsar.functions.runtime.RuntimeSpawner.lambda$start$0(RuntimeSpawner.java:88) ~[org.apache.pulsar-pulsar-functions-runtime-2.4.1.jar:2.4.1] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_222] at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [?:1.8.0_222] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_222] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [?:1.8.0_222] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_222] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_222] at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_222] 12:45:03.866 [function-timer-thread-3-1] ERROR org.apache.pulsar.functions.runtime.RuntimeSpawner - public/default/womble2-java.lang.RuntimeException: Function Container is dead with exception.. restarting 12:45:03.872 [function-timer-thread-3-1] INFO org.apache.pulsar.functions.runtime.ProcessRuntime - Creating function log directory /home/ubuntu/apache-pulsar-2.4.1/logs/functions/public/default/womble2 12:45:03.874 [function-timer-thread-3-1] INFO org.apache.pulsar.functions.runtime.ProcessRuntime - Created or found function log directory /home/ubuntu/apache-pulsar-2.4.1/logs/functions/public/default/womble2 12:45:03.875 [function-timer-thread-3-1] INFO org.apache.pulsar.functions.runtime.ProcessRuntime - ProcessBuilder starting the process with args python /home/ubuntu/apache-pulsar-2.4.1/instances/python-instance/python_instance_main.py --py /home/ubuntu/func1.py --logging_directory /home/ubuntu/apache-pulsar-2.4.1/logs/functions --logging_file womble2 --logging_config_file /home/ubuntu/apache-pulsar-2.4.1/conf/functions-logging/logging_config.ini --instance_id 0 --function_id a776f9d5-db5e-4b59-9b01-ab7661ea0df2 --function_version 48bb96e4-69a9-43f9-b880-7ae89d41b0e3 --function_details '{"tenant":"public","namespace":"default","name":"womble2","className":"FirstFunction","logTopic":"pulsar-log","runtime":"PYTHON","autoAck":true,"parallelism":1,"source":{"inputSpecs":{"my-topic":{}},"cleanupSubscription":true},"sink":{},"resources":{"cpu":1.0,"ram":"1073741824","disk":"10737418240"},"componentType":"FUNCTION"}' --pulsar_serviceurl pulsar://localhost:6650 --use_tls false --tls_allow_insecure false --hostname_verification_enabled false --max_buffered_tuples 1024 --port 46365 --metrics_port 44849 --expected_healthcheck_interval 30 --secrets_provider secretsprovider.ClearTextSecretsProvider --cluster_name local 12:45:03.886 [function-timer-thread-3-1] INFO org.apache.pulsar.functions.runtime.ProcessRuntime - Started process successfully 12:45:03.912 [Timer-0] INFO org.apache.pulsar.functions.LocalRunner - { "failureException": "UNAVAILABLE: io exception", "instanceId": "0" } 2019-10-31 12:45:04.859 INFO ConnectionPool:72 | Created connection for pulsar://localhost:6650 2019-10-31 12:45:04.867 INFO ClientConnection:324 | [127.0.0.1:52910 -> 127.0.0.1:6650] Connected to broker 2019-10-31 12:45:04.878 INFO BatchMessageContainer:43 | { BatchContainer [size = 0] [batchSizeInBytes_ = 0] [maxAllowedMessageBatchSizeInBytes_ = 131072] [maxAllowedNumMessagesInBatch_ = 1000] [topicName = persistent://public/default/pulsar-log] [producerName_ = ] [batchSizeInBytes_ = 0] [numberOfBatchesSent = 0] [averageBatchSize = 0]} BatchMessageContainer constructed 2019-10-31 12:45:04.878 INFO HandlerBase:52 | [persistent://public/default/pulsar-log, ] Getting connection from pool 2019-10-31 12:45:04.887 INFO ConnectionPool:72 | Created connection for pulsar://ldex-pulsar.int.example.net:6650 2019-10-31 12:45:04.887 INFO ClientConnection:326 | [127.0.0.1:52912 -> 127.0.0.1:6650] Connected to broker through proxy. Logical broker: pulsar://ldex-pulsar.int.example.net:6650 2019-10-31 12:45:04.890 INFO ProducerImpl:155 | [persistent://public/default/pulsar-log, ] Created producer on broker [127.0.0.1:52912 -> 127.0.0.1:6650] 2019-10-31 12:45:04.900 INFO Client:88 | Subscribing on Topic :my-topic 2019-10-31 12:45:04.902 INFO HandlerBase:52 | [persistent://public/default/my-topic, public/default/womble2, 0] Getting connection from pool 2019-10-31 12:45:04.914 INFO ConsumerImpl:170 | [persistent://public/default/my-topic, public/default/womble2, 0] Created consumer on broker [127.0.0.1:52912 -> 127.0.0.1:6650] ^C12:45:12.279 [Thread-1] INFO org.apache.pulsar.functions.LocalRunner - Shutting down the localrun runtimeSpawner ... ``` Notice the 30 second delay as it was trying to report the exception. Running localrun under strace, finally I can see the backtrace: ``` $ strace -f -s128 apache-pulsar-2.4.1/bin/pulsar-admin functions localrun --name womble2 --inputs my-topic --py /home/ubuntu/func1.py --classname FirstFunction --log-topic pulsar-log 2>/tmp/strace.out ... ... [pid 17173] read(10, "#!/usr/bin/env python3\n#\n# Licensed to the Apache Software Foundation (ASF) under one\n# or more contributor license agreements. "..., 40 96) = 4096 [pid 17173] close(10) = 0 [pid 17173] lseek(9, 0, SEEK_SET) = 0 [pid 17173] lseek(9, 0, SEEK_CUR) = 0 [pid 17173] read(9, "#!/usr/bin/env python3\n#\n# Licensed to the Apache Software Foundation (ASF) under one\n# or more contributor license agreements. "..., 819 2) = 8192 [pid 17173] read(9, "Function Module %s\" % self.instance_config.function_details.className)\n raise NameError(\"Could not import User Function Mod"..., 819 2) = 8192 [pid 17173] close(9) = 0 [pid 17173] getpid() = 17173 [pid 17173] lseek(3, 0, SEEK_END) = 16934 [pid 17173] lseek(3, 0, SEEK_CUR) = 16934 [pid 17173] write(3, "[2019-10-31 12:48:33 +0000] [ERROR] log.py: raise NameError(\"Could not import User Function Module %s\" % self.instance_config.fu"..., 154 ) = 154 [pid 17173] getpid() = 17173 [pid 17173] lseek(3, 0, SEEK_END) = 17088 [pid 17173] lseek(3, 0, SEEK_CUR) = 17088 [pid 17173] write(3, "[2019-10-31 12:48:33 +0000] [ERROR] log.py: NameError\n", 54) = 54 [pid 17173] getpid() = 17173 [pid 17173] lseek(3, 0, SEEK_END) = 17142 [pid 17173] lseek(3, 0, SEEK_CUR) = 17142 [pid 17173] write(3, "[2019-10-31 12:48:33 +0000] [ERROR] log.py: :\n", 46) = 46 [pid 17173] getpid() = 17173 [pid 17173] lseek(3, 0, SEEK_END) = 17188 [pid 17173] lseek(3, 0, SEEK_CUR) = 17188 [pid 17173] write(3, "[2019-10-31 12:48:33 +0000] [ERROR] log.py: Could not import User Function Module FirstFunction\n", 96) = 96 [pid 17173] close(3) = 0 ... [pid 17174] read(3, "\312\376\272\276\0\0\0004\0[\7\0*\n\0+\0,\n\0\1\0-\n\0\24\0.\7\0/\n\0\5\0000\t\0\23\0001\n\0002\0003\n\0004\0005\n\0002\0006\t\0\23\0007\n\0 \23\0008\n\0002\0009\t\0:\0;\7\0<\n\0\17\0=\7\0>\n\0\24\0009\7\0?\7\0B\1\0\tcloseLock\1\0\22Ljava/lang/Obje"..., 1711) = 1711 [pid 17158] write(1, "12:48:33.287 [main] INFO org.apache.pulsar.functions.LocalRunner - RuntimeSpawner quit because of\n", 99 <unfinished ...> ``` **Expected behavior** I would expect the stderr which my function generates before it dies, to be captured and displayed in logs somewhere. **Screenshots** See above **Desktop (please complete the following information):** - Ubuntu 18.04 (running in lxd container) **Additional context** The fact that my function doesn't start could be a whole bunch of reasons - `log.py` suggests it's because I'm trying to use log-topic - but that's a separate issue that I need to debug. The issue here is that the message which might help has not been kept. I can see there is other information which is being logged but lost. That error comes from `python_instance.py` which in turn calls `import_class()` from `util.py` ``` def import_class(from_path, full_class_name): from_path = str(from_path) full_class_name = str(full_class_name) try: return import_class_from_path(from_path, full_class_name) except Exception as e: our_dir = os.path.dirname(os.path.abspath(inspect.getfile(inspect.currentframe()))) api_dir = os.path.join(our_dir, PULSAR_API_ROOT, PULSAR_FUNCTIONS_API_ROOT) try: return import_class_from_path(api_dir, full_class_name) except Exception as e: Log.info("Failed to import class %s from path %s" % (full_class_name, from_path)) Log.info(e, exc_info=True) return None ``` When this function returns None there should be a Log.info(). Looking in the strace output I see the detail there: ``` $ grep "Failed to import class" /tmp/strace.out [pid 17173] write(3, "[2019-10-31 12:48:33 +0000] [INFO] util.py: Failed to import class FirstFunction from path /home/ubuntu\n", 104) = 104 ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
