candlerb opened a new issue #5521: Python exception message not shown when 
python function fails to start
URL: https://github.com/apache/pulsar/issues/5521
 
 
   **Describe the bug**
   I have a problem starting a python function.  However, the actual python 
error message (reported on stderr) is being lost.  I can only see it if I 
"strace" the entire pulsar stack.
   
   **To Reproduce**
   I am using pulsar functions with pulsar 2.4.1 in standalone mode, base OS is 
Ubuntu 18.04, and I've done the frig to link "python" to "python3":
   
   ```
   sudo update-alternatives --install /usr/bin/python python /usr/bin/python3 10
   ```
   
   Here's my function:
   
   ```
   $ cat func1.py
   from pulsar import Function
   
   class FirstFunction(Function):
       def process(self, item, context):
           log = context.get_logger()
           log.info("Got %r with properties %r" % (item, 
context.get_message_properties()))
   ```
   
   I can run it from the command line and it just returns without error.
   
   Deployed:
   
   ```
   $ apache-pulsar-2.4.1/bin/pulsar-admin functions create --name womble 
--inputs my-topic --py /home/ubuntu/func1.py --classname FirstFunction 
--log-topic pulsar-log
   
   $ apache-pulsar-2.4.1/bin/pulsar-admin functions status --name womble
   {
     "numInstances" : 1,
     "numRunning" : 0,
     "instances" : [ {
       "instanceId" : 0,
       "status" : {
         "running" : false,
         "error" : "",
         "numRestarts" : 222,
         "numReceived" : 0,
         "numSuccessfullyProcessed" : 0,
         "numUserExceptions" : 0,
         "latestUserExceptions" : [ ],
         "numSystemExceptions" : 0,
         "latestSystemExceptions" : [ ],
         "averageLatency" : 0.0,
         "lastInvocationTime" : 0,
         "workerId" : "c-standalone-fw-ldex-pulsar.int.example.net-8080"
       }
     } ]
   }
   ```
   
   Looking at the pulsar process output, I see it restarting every 30 seconds 
but no indication of why it's dying.  There's a "DeathException" but no further 
info - not even the process exit code as fas as I can see.
   
   ```
   12:39:45.003 [function-timer-thread-94-1] ERROR 
org.apache.pulsar.functions.runtime.RuntimeSpawner - 
public/default/womble-java.lang.RuntimeException:  Function
   Container is dead with exception.. restarting
   12:39:45.003 [function-timer-thread-94-1] INFO  
org.apache.pulsar.functions.runtime.ProcessRuntime - Creating function log 
directory /home/ubuntu/apache-pulsar-2
   .4.1/logs/functions/public/default/womble
   12:39:45.003 [function-timer-thread-94-1] INFO  
org.apache.pulsar.functions.runtime.ProcessRuntime - Created or found function 
log directory /home/ubuntu/apache-
   pulsar-2.4.1/logs/functions/public/default/womble
   12:39:45.003 [function-timer-thread-94-1] INFO  
org.apache.pulsar.functions.runtime.ProcessRuntime - ProcessBuilder starting 
the process with args python /home/u
   buntu/apache-pulsar-2.4.1/instances/python-instance/python_instance_main.py 
--py /tmp/pulsar_functions/public/default/womble/0/func1.py --logging_directory 
/home
   /ubuntu/apache-pulsar-2.4.1/logs/functions --logging_file womble 
--logging_config_file 
/home/ubuntu/apache-pulsar-2.4.1/conf/functions-logging/logging_config.ini
    --instance_id 0 --function_id 67fb9828-56ab-49ff-bbbc-56db9510d26c 
--function_version 4f01d696-28da-40b7-b9f5-2283fbe9bd7c --function_details 
'{"tenant":"public
   
","namespace":"default","name":"womble","className":"FirstFunction","logTopic":"pulsar-log","runtime":"PYTHON","autoAck":true,"parallelism":1,"source":{"inputSpe
   
cs":{"my-topic":{}},"cleanupSubscription":true},"sink":{},"resources":{"cpu":1.0,"ram":"1073741824","disk":"10737418240"},"componentType":"FUNCTION"}'
 --pulsar_s
   erviceurl pulsar://ldex-pulsar.int.example.net:6650 --max_buffered_tuples 
1024 --port 38423 --metrics_port 36591 --state_storage_serviceurl 
bk://127.0.0.1:418
   1 --expected_healthcheck_interval 30 --secrets_provider 
secretsprovider.ClearTextSecretsProvider --cluster_name standalone
   12:39:45.005 [function-timer-thread-94-1] INFO  
org.apache.pulsar.functions.runtime.ProcessRuntime - Started process 
successfully
   2019-10-31 12:39:45.478 INFO  ConnectionPool:72 | Created connection for 
pulsar://ldex-pulsar.int.example.net:6650
   2019-10-31 12:39:45.480 INFO  ClientConnection:324 | [10.85.179.103:37476 -> 
10.85.179.103:6650] Connected to broker
   12:39:45.480 [pulsar-io-50-6] INFO  
org.apache.pulsar.broker.service.ServerCnx - New connection from 
/10.85.179.103:37476
   2019-10-31 12:39:45.481 INFO  BatchMessageContainer:43 | { BatchContainer 
[size = 0] [batchSizeInBytes_ = 0] [maxAllowedMessageBatchSizeInBytes_ = 
131072] [maxAl
   lowedNumMessagesInBatch_ = 1000] [topicName = 
persistent://public/default/pulsar-log] [producerName_ = ] [batchSizeInBytes_ = 
0] [numberOfBatchesSent = 0] [avera
   geBatchSize = 0]} BatchMessageContainer constructed
   2019-10-31 12:39:45.481 INFO  HandlerBase:52 | 
[persistent://public/default/pulsar-log, ] Getting connection from pool
   12:39:45.482 [pulsar-io-50-6] INFO  
org.apache.pulsar.broker.service.ServerCnx - 
[/10.85.179.103:37476][persistent://public/default/pulsar-log] Creating producer
   . producerId=0
   12:39:45.483 [ForkJoinPool.commonPool-worker-1] INFO  
org.apache.pulsar.broker.service.ServerCnx - [/10.85.179.103:37476]-0 
persistent://public/default/pulsar-lo
   g configured with schema false
   12:39:45.483 [ForkJoinPool.commonPool-worker-1] INFO  
org.apache.pulsar.broker.service.ServerCnx - [/10.85.179.103:37476] Created new 
producer: Producer{topic=Pe
   rsistentTopic{topic=persistent://public/default/pulsar-log}, 
client=/10.85.179.103:37476, producerName=standalone-1-142, producerId=0}
   2019-10-31 12:39:45.483 INFO  ProducerImpl:155 | 
[persistent://public/default/pulsar-log, ] Created producer on broker 
[10.85.179.103:37476 -> 10.85.179.103:6650
   ]
   2019-10-31 12:39:45.504 INFO  Client:88 | Subscribing on Topic :my-topic
   2019-10-31 12:39:45.505 INFO  HandlerBase:52 | 
[persistent://public/default/my-topic, public/default/womble, 0] Getting 
connection from pool
   12:39:45.505 [pulsar-io-50-6] INFO  
org.apache.pulsar.broker.service.ServerCnx - [/10.85.179.103:37476] Subscribing 
on topic persistent://public/default/my-topic
    / public/default/womble
   12:39:45.505 [pulsar-io-50-6] INFO  
org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - 
[public/default/persistent/my-topic-public%2Fdefault%2Fwomble] Rewind
   from 36531:29 to 36531:9
   12:39:45.506 [pulsar-io-50-6] INFO  
org.apache.pulsar.broker.service.persistent.PersistentTopic - 
[persistent://public/default/my-topic] There are no replicated
   subscriptions on the topic
   12:39:45.506 [pulsar-io-50-6] INFO  
org.apache.pulsar.broker.service.persistent.PersistentTopic - 
[persistent://public/default/my-topic][public/default/womble] C
   reated new subscription for 0
   12:39:45.506 [pulsar-io-50-6] INFO  
org.apache.pulsar.broker.service.ServerCnx - [/10.85.179.103:37476] Created 
subscription on topic persistent://public/default
   /my-topic / public/default/womble
   2019-10-31 12:39:45.506 INFO  ConsumerImpl:170 | 
[persistent://public/default/my-topic, public/default/womble, 0] Created 
consumer on broker [10.85.179.103:37476
    -> 10.85.179.103:6650]
   12:39:45.598 [pulsar-io-50-6] INFO  
org.apache.pulsar.broker.service.ServerCnx - Closed connection from 
/10.85.179.103:37476
   12:39:45.598 [pulsar-io-50-6] INFO  
org.apache.pulsar.broker.service.persistent.PersistentDispatcherMultipleConsumers
 - Removed consumer Consumer{subscription=Pe
   rsistentSubscription{topic=persistent://public/default/my-topic, 
name=public/default/womble}, consumerId=0, consumerName=f01655, 
address=/10.85.179.103:37476} with pending 20 acks
   12:40:00.754 [pulsar-web-57-5] INFO  org.eclipse.jetty.server.RequestLog - 
10.85.179.103 - - [31/Oct/2019:12:40:00 +0000] "GET 
/admin/v2/persistent/public/functi
   ons/coordinate/stats HTTP/1.1" 200 851 "-" "Pulsar-Java-v2.4.1" 2
   12:40:15.002 [function-timer-thread-94-1] ERROR 
org.apache.pulsar.functions.runtime.ProcessRuntime - Extracted Process death 
exception
   java.lang.RuntimeException:
           at 
org.apache.pulsar.functions.runtime.ProcessRuntime.tryExtractingDeathException(ProcessRuntime.java:380)
 ~[org.apache.pulsar-pulsar-functions-runtime-2.4.1.jar:2.4.1]
           at 
org.apache.pulsar.functions.runtime.ProcessRuntime.isAlive(ProcessRuntime.java:367)
 ~[org.apache.pulsar-pulsar-functions-runtime-2.4.1.jar:2.4.1]
           at 
org.apache.pulsar.functions.runtime.RuntimeSpawner.lambda$start$0(RuntimeSpawner.java:88)
 ~[org.apache.pulsar-pulsar-functions-runtime-2.4.1.jar:2.4.1]
           at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
[?:1.8.0_222]
           at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
[?:1.8.0_222]
           at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
 [?:1.8.0_222]
           at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
 [?:1.8.0_222]
           at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[?:1.8.0_222]
           at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[?:1.8.0_222]
           at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
 [io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final]
           at java.lang.Thread.run(Thread.java:748) [?:1.8.0_222]
   12:40:15.003 [function-timer-thread-94-1] ERROR 
org.apache.pulsar.functions.runtime.RuntimeSpawner - 
public/default/womble-java.lang.RuntimeException:  Function Container is dead 
with exception.. restarting
   12:40:15.003 [function-timer-thread-94-1] INFO  
org.apache.pulsar.functions.runtime.ProcessRuntime - Creating function log 
directory /home/ubuntu/apache-pulsar-2.4.1/logs/functions/public/default/womble
   12:40:15.003 [function-timer-thread-94-1] INFO  
org.apache.pulsar.functions.runtime.ProcessRuntime - Created or found function 
log directory 
/home/ubuntu/apache-pulsar-2.4.1/logs/functions/public/default/womble
   12:40:15.004 [function-timer-thread-94-1] INFO  
org.apache.pulsar.functions.runtime.ProcessRuntime - ProcessBuilder starting 
the process with args python 
/home/ubuntu/apache-pulsar-2.4.1/instances/python-instance/python_instance_main.py
 --py /tmp/pulsar_functions/public/default/womble/0/func1.py 
--logging_directory /home/ubuntu/apache-pulsar-2.4.1/logs/functions 
--logging_file womble --logging_config_file 
/home/ubuntu/apache-pulsar-2.4.1/conf/functions-logging/logging_config.ini 
--instance_id 0 --function_id 67fb9828-56ab-49ff-bbbc-56db9510d26c 
--function_version 4f01d696-28da-40b7-b9f5-2283fbe9bd7c --function_details 
'{"tenant":"public","namespace":"default","name":"womble","className":"FirstFunction","logTopic":"pulsar-log","runtime":"PYTHON","autoAck":true,"parallelism":1,"source":{"inputSpecs":{"my-topic":{}},"cleanupSubscription":true},"sink":{},"resources":{"cpu":1.0,"ram":"1073741824","disk":"10737418240"},"componentType":"FUNCTION"}'
 --pulsar_serviceurl pulsar://ldex-pulsar.int.example.net:6650 
--max_buffered_tuples 1024 --port 38423 --metrics_port 36591 
--state_storage_serviceurl bk://127.0.0.1:4181 --expected_healthcheck_interval 
30 --secrets_provider secretsprovider.ClearTextSecretsProvider --cluster_name 
standalone
   12:40:15.010 [function-timer-thread-94-1] INFO  
org.apache.pulsar.functions.runtime.ProcessRuntime - Started process 
successfully
   2019-10-31 12:40:15.481 INFO  ConnectionPool:72 | Created connection for 
pulsar://ldex-pulsar.int.example.net:6650
   2019-10-31 12:40:15.483 INFO  ClientConnection:324 | [10.85.179.103:37928 -> 
10.85.179.103:6650] Connected to broker
   12:40:15.484 [pulsar-io-50-7] INFO  
org.apache.pulsar.broker.service.ServerCnx - New connection from 
/10.85.179.103:37928
   2019-10-31 12:40:15.485 INFO  BatchMessageContainer:43 | { BatchContainer 
[size = 0] [batchSizeInBytes_ = 0] [maxAllowedMessageBatchSizeInBytes_ = 
131072] [maxAllowedNumMessagesInBatch_ = 1000] [topicName = 
persistent://public/default/pulsar-log] [producerName_ = ] [batchSizeInBytes_ = 
0] [numberOfBatchesSent = 0] [averageBatchSize = 0]} BatchMessageContainer 
constructed
   2019-10-31 12:40:15.485 INFO  HandlerBase:52 | 
[persistent://public/default/pulsar-log, ] Getting connection from pool
   12:40:15.487 [pulsar-io-50-7] INFO  
org.apache.pulsar.broker.service.ServerCnx - 
[/10.85.179.103:37928][persistent://public/default/pulsar-log] Creating 
producer. producerId=0
   12:40:15.487 [ForkJoinPool.commonPool-worker-2] INFO  
org.apache.pulsar.broker.service.ServerCnx - [/10.85.179.103:37928]-0 
persistent://public/default/pulsar-log configured with schema false
   12:40:15.488 [ForkJoinPool.commonPool-worker-2] INFO  
org.apache.pulsar.broker.service.ServerCnx - [/10.85.179.103:37928] Created new 
producer: 
Producer{topic=PersistentTopic{topic=persistent://public/default/pulsar-log}, 
client=/10.85.179.103:37928, producerName=standalone-1-143, producerId=0}
   2019-10-31 12:40:15.488 INFO  ProducerImpl:155 | 
[persistent://public/default/pulsar-log, ] Created producer on broker 
[10.85.179.103:37928 -> 10.85.179.103:6650]
   2019-10-31 12:40:15.521 INFO  Client:88 | Subscribing on Topic :my-topic
   2019-10-31 12:40:15.522 INFO  HandlerBase:52 | 
[persistent://public/default/my-topic, public/default/womble, 0] Getting 
connection from pool
   12:40:15.523 [pulsar-io-50-7] INFO  
org.apache.pulsar.broker.service.ServerCnx - [/10.85.179.103:37928] Subscribing 
on topic persistent://public/default/my-topic / public/default/womble
   12:40:15.524 [pulsar-io-50-7] INFO  
org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - 
[public/default/persistent/my-topic-public%2Fdefault%2Fwomble] Rewind from 
36531:29 to 36531:9
   12:40:15.524 [pulsar-io-50-7] INFO  
org.apache.pulsar.broker.service.persistent.PersistentTopic - 
[persistent://public/default/my-topic] There are no replicated subscriptions on 
the topic
   12:40:15.524 [pulsar-io-50-7] INFO  
org.apache.pulsar.broker.service.persistent.PersistentTopic - 
[persistent://public/default/my-topic][public/default/womble] Created new 
subscription for 0
   12:40:15.524 [pulsar-io-50-7] INFO  
org.apache.pulsar.broker.service.ServerCnx - [/10.85.179.103:37928] Created 
subscription on topic persistent://public/default/my-topic / 
public/default/womble
   2019-10-31 12:40:15.524 INFO  ConsumerImpl:170 | 
[persistent://public/default/my-topic, public/default/womble, 0] Created 
consumer on broker [10.85.179.103:37928 -> 10.85.179.103:6650]
   12:40:15.632 [pulsar-io-50-7] INFO  
org.apache.pulsar.broker.service.ServerCnx - Closed connection from 
/10.85.179.103:37928
   12:40:15.633 [pulsar-io-50-7] INFO  
org.apache.pulsar.broker.service.persistent.PersistentDispatcherMultipleConsumers
 - Removed consumer 
Consumer{subscription=PersistentSubscription{topic=persistent://public/default/my-topic,
 name=public/default/womble}, consumerId=0, consumerName=b4b94f, 
address=/10.85.179.103:37928} with pending 20 acks
   ```
   
   And so on.
   
   Using localrun mode isn't any better:
   
   ```
   $ apache-pulsar-2.4.1/bin/pulsar-admin functions localrun --name womble2 
--inputs my-topic --py /home/ubuntu/func1.py --classname FirstFunction 
--log-topic pulsar-log
   12:44:32.943 [main] INFO  
org.apache.pulsar.functions.runtime.ProcessRuntimeFactory - Java instance jar 
location is not defined, using the location defined in system environment : 
/home/ubuntu/apache-pulsar-2.4.1/instances/java-instance.jar
   12:44:32.947 [main] INFO  
org.apache.pulsar.functions.runtime.ProcessRuntimeFactory - Python instance 
file location is not defined using the location defined in system environment : 
/home/ubuntu/apache-pulsar-2.4.1/instances/python-instance/python_instance_main.py
   12:44:32.948 [main] INFO  
org.apache.pulsar.functions.runtime.ProcessRuntimeFactory - No extra 
dependencies location is defined in either function worker config or system 
environment
   12:44:33.085 [main] INFO  org.apache.pulsar.functions.runtime.RuntimeSpawner 
- public/default/womble2-0 RuntimeSpawner starting function
   12:44:33.086 [main] INFO  org.apache.pulsar.functions.runtime.ProcessRuntime 
- Creating function log directory 
/home/ubuntu/apache-pulsar-2.4.1/logs/functions/public/default/womble2
   12:44:33.088 [main] INFO  org.apache.pulsar.functions.runtime.ProcessRuntime 
- Created or found function log directory 
/home/ubuntu/apache-pulsar-2.4.1/logs/functions/public/default/womble2
   12:44:33.088 [main] INFO  org.apache.pulsar.functions.runtime.ProcessRuntime 
- ProcessBuilder starting the process with args python 
/home/ubuntu/apache-pulsar-2.4.1/instances/python-instance/python_instance_main.py
 --py /home/ubuntu/func1.py --logging_directory 
/home/ubuntu/apache-pulsar-2.4.1/logs/functions --logging_file womble2 
--logging_config_file 
/home/ubuntu/apache-pulsar-2.4.1/conf/functions-logging/logging_config.ini 
--instance_id 0 --function_id a776f9d5-db5e-4b59-9b01-ab7661ea0df2 
--function_version 48bb96e4-69a9-43f9-b880-7ae89d41b0e3 --function_details 
'{"tenant":"public","namespace":"default","name":"womble2","className":"FirstFunction","logTopic":"pulsar-log","runtime":"PYTHON","autoAck":true,"parallelism":1,"source":{"inputSpecs":{"my-topic":{}},"cleanupSubscription":true},"sink":{},"resources":{"cpu":1.0,"ram":"1073741824","disk":"10737418240"},"componentType":"FUNCTION"}'
 --pulsar_serviceurl pulsar://localhost:6650 --use_tls false 
--tls_allow_insecure false --hostname_verification_enabled false 
--max_buffered_tuples 1024 --port 46365 --metrics_port 44849 
--expected_healthcheck_interval 30 --secrets_provider 
secretsprovider.ClearTextSecretsProvider --cluster_name local
   12:44:33.104 [main] INFO  org.apache.pulsar.functions.runtime.ProcessRuntime 
- Started process successfully
   2019-10-31 12:44:33.546 INFO  ConnectionPool:72 | Created connection for 
pulsar://localhost:6650
   2019-10-31 12:44:33.547 INFO  ClientConnection:324 | [127.0.0.1:52338 -> 
127.0.0.1:6650] Connected to broker
   2019-10-31 12:44:33.549 INFO  BatchMessageContainer:43 | { BatchContainer 
[size = 0] [batchSizeInBytes_ = 0] [maxAllowedMessageBatchSizeInBytes_ = 
131072] [maxAllowedNumMessagesInBatch_ = 1000] [topicName = 
persistent://public/default/pulsar-log] [producerName_ = ] [batchSizeInBytes_ = 
0] [numberOfBatchesSent = 0] [averageBatchSize = 0]} BatchMessageContainer 
constructed
   2019-10-31 12:44:33.549 INFO  HandlerBase:52 | 
[persistent://public/default/pulsar-log, ] Getting connection from pool
   2019-10-31 12:44:33.550 INFO  ConnectionPool:72 | Created connection for 
pulsar://ldex-pulsar.int.example.net:6650
   2019-10-31 12:44:33.550 INFO  ClientConnection:326 | [127.0.0.1:52340 -> 
127.0.0.1:6650] Connected to broker through proxy. Logical broker: 
pulsar://ldex-pulsar.int.example.net:6650
   2019-10-31 12:44:33.551 INFO  ProducerImpl:155 | 
[persistent://public/default/pulsar-log, ] Created producer on broker 
[127.0.0.1:52340 -> 127.0.0.1:6650]
   2019-10-31 12:44:33.553 INFO  Client:88 | Subscribing on Topic :my-topic
   2019-10-31 12:44:33.554 INFO  HandlerBase:52 | 
[persistent://public/default/my-topic, public/default/womble2, 0] Getting 
connection from pool
   2019-10-31 12:44:33.626 INFO  ConsumerImpl:170 | 
[persistent://public/default/my-topic, public/default/womble2, 0] Created 
consumer on broker [127.0.0.1:52340 -> 127.0.0.1:6650]
   12:44:33.703 [main] INFO  org.apache.pulsar.functions.LocalRunner - 
RuntimeSpawner quit because of
   
   << hangs for 30 seconds here - right after "because of" !! >>
   
   12:45:03.838 [function-timer-thread-3-1] ERROR 
org.apache.pulsar.functions.runtime.ProcessRuntime - Health check failed for 
womble2-0
   java.util.concurrent.ExecutionException: io.grpc.StatusRuntimeException: 
UNAVAILABLE: io exception
       at 
java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) 
~[?:1.8.0_222]
       at 
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895) 
~[?:1.8.0_222]
       at 
org.apache.pulsar.functions.runtime.ProcessRuntime.lambda$start$1(ProcessRuntime.java:164)
 ~[org.apache.pulsar-pulsar-functions-runtime-2.4.1.jar:2.4.1]
       at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
[?:1.8.0_222]
       at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
[?:1.8.0_222]
       at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
 [?:1.8.0_222]
       at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
 [?:1.8.0_222]
       at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[?:1.8.0_222]
       at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[?:1.8.0_222]
       at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
 [io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final]
       at java.lang.Thread.run(Thread.java:748) [?:1.8.0_222]
   Caused by: io.grpc.StatusRuntimeException: UNAVAILABLE: io exception
       at io.grpc.Status.asRuntimeException(Status.java:530) 
~[io.grpc-grpc-core-1.18.0.jar:1.18.0]
       at 
io.grpc.stub.ClientCalls$UnaryStreamToFuture.onClose(ClientCalls.java:482) 
~[io.grpc-grpc-stub-1.18.0.jar:1.18.0]
       at 
io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
 ~[io.grpc-grpc-core-1.18.0.jar:1.18.0]
       at 
io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
 ~[io.grpc-grpc-core-1.18.0.jar:1.18.0]
       at 
io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
 ~[io.grpc-grpc-core-1.18.0.jar:1.18.0]
       at 
io.grpc.internal.CensusStatsModule$StatsClientInterceptor$1$1.onClose(CensusStatsModule.java:699)
 ~[io.grpc-grpc-core-1.18.0.jar:1.18.0]
       at 
io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
 ~[io.grpc-grpc-core-1.18.0.jar:1.18.0]
       at 
io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
 ~[io.grpc-grpc-core-1.18.0.jar:1.18.0]
       at 
io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
 ~[io.grpc-grpc-core-1.18.0.jar:1.18.0]
       at 
io.grpc.internal.CensusTracingModule$TracingClientInterceptor$1$1.onClose(CensusTracingModule.java:397)
 ~[io.grpc-grpc-core-1.18.0.jar:1.18.0]
       at 
io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:459) 
~[io.grpc-grpc-core-1.18.0.jar:1.18.0]
       at io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:63) 
~[io.grpc-grpc-core-1.18.0.jar:1.18.0]
       at 
io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.close(ClientCallImpl.java:546)
 ~[io.grpc-grpc-core-1.18.0.jar:1.18.0]
       at 
io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.access$600(ClientCallImpl.java:467)
 ~[io.grpc-grpc-core-1.18.0.jar:1.18.0]
       at 
io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:584)
 ~[io.grpc-grpc-core-1.18.0.jar:1.18.0]
       at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) 
~[io.grpc-grpc-core-1.18.0.jar:1.18.0]
       at 
io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123) 
~[io.grpc-grpc-core-1.18.0.jar:1.18.0]
       at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
~[?:1.8.0_222]
       at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
~[?:1.8.0_222]
       ... 1 more
   Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: 
Connection refused: /127.0.0.1:46365
       at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) 
~[?:1.8.0_222]
       at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) 
~[?:1.8.0_222]
       at 
io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:327)
 ~[io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final]
       at 
io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340)
 ~[io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final]
       at 
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:644) 
~[io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final]
       at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:591)
 ~[io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final]
       at 
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:508) 
~[io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final]
       at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:470) 
~[io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final]
       at 
io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:909)
 ~[io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final]
       ... 2 more
   Caused by: java.net.ConnectException: Connection refused
       at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) 
~[?:1.8.0_222]
       at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) 
~[?:1.8.0_222]
       at 
io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:327)
 ~[io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final]
       at 
io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340)
 ~[io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final]
       at 
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:644) 
~[io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final]
       at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:591)
 ~[io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final]
       at 
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:508) 
~[io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final]
       at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:470) 
~[io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final]
       at 
io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:909)
 ~[io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final]
       ... 2 more
   12:45:03.864 [function-timer-thread-3-1] ERROR 
org.apache.pulsar.functions.runtime.ProcessRuntime - Extracted Process death 
exception
   java.lang.RuntimeException:
       at 
org.apache.pulsar.functions.runtime.ProcessRuntime.tryExtractingDeathException(ProcessRuntime.java:380)
 ~[org.apache.pulsar-pulsar-functions-runtime-2.4.1.jar:2.4.1]
       at 
org.apache.pulsar.functions.runtime.ProcessRuntime.isAlive(ProcessRuntime.java:367)
 ~[org.apache.pulsar-pulsar-functions-runtime-2.4.1.jar:2.4.1]
       at 
org.apache.pulsar.functions.runtime.RuntimeSpawner.lambda$start$0(RuntimeSpawner.java:88)
 ~[org.apache.pulsar-pulsar-functions-runtime-2.4.1.jar:2.4.1]
       at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
[?:1.8.0_222]
       at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
[?:1.8.0_222]
       at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
 [?:1.8.0_222]
       at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
 [?:1.8.0_222]
       at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[?:1.8.0_222]
       at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[?:1.8.0_222]
       at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
 [io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final]
       at java.lang.Thread.run(Thread.java:748) [?:1.8.0_222]
   12:45:03.866 [function-timer-thread-3-1] ERROR 
org.apache.pulsar.functions.runtime.RuntimeSpawner - 
public/default/womble2-java.lang.RuntimeException:  Function Container is dead 
with exception.. restarting
   12:45:03.872 [function-timer-thread-3-1] INFO  
org.apache.pulsar.functions.runtime.ProcessRuntime - Creating function log 
directory /home/ubuntu/apache-pulsar-2.4.1/logs/functions/public/default/womble2
   12:45:03.874 [function-timer-thread-3-1] INFO  
org.apache.pulsar.functions.runtime.ProcessRuntime - Created or found function 
log directory 
/home/ubuntu/apache-pulsar-2.4.1/logs/functions/public/default/womble2
   12:45:03.875 [function-timer-thread-3-1] INFO  
org.apache.pulsar.functions.runtime.ProcessRuntime - ProcessBuilder starting 
the process with args python 
/home/ubuntu/apache-pulsar-2.4.1/instances/python-instance/python_instance_main.py
 --py /home/ubuntu/func1.py --logging_directory 
/home/ubuntu/apache-pulsar-2.4.1/logs/functions --logging_file womble2 
--logging_config_file 
/home/ubuntu/apache-pulsar-2.4.1/conf/functions-logging/logging_config.ini 
--instance_id 0 --function_id a776f9d5-db5e-4b59-9b01-ab7661ea0df2 
--function_version 48bb96e4-69a9-43f9-b880-7ae89d41b0e3 --function_details 
'{"tenant":"public","namespace":"default","name":"womble2","className":"FirstFunction","logTopic":"pulsar-log","runtime":"PYTHON","autoAck":true,"parallelism":1,"source":{"inputSpecs":{"my-topic":{}},"cleanupSubscription":true},"sink":{},"resources":{"cpu":1.0,"ram":"1073741824","disk":"10737418240"},"componentType":"FUNCTION"}'
 --pulsar_serviceurl pulsar://localhost:6650 --use_tls false 
--tls_allow_insecure false --hostname_verification_enabled false 
--max_buffered_tuples 1024 --port 46365 --metrics_port 44849 
--expected_healthcheck_interval 30 --secrets_provider 
secretsprovider.ClearTextSecretsProvider --cluster_name local
   12:45:03.886 [function-timer-thread-3-1] INFO  
org.apache.pulsar.functions.runtime.ProcessRuntime - Started process 
successfully
   12:45:03.912 [Timer-0] INFO  org.apache.pulsar.functions.LocalRunner - {
     "failureException": "UNAVAILABLE: io exception",
     "instanceId": "0"
   }
   2019-10-31 12:45:04.859 INFO  ConnectionPool:72 | Created connection for 
pulsar://localhost:6650
   2019-10-31 12:45:04.867 INFO  ClientConnection:324 | [127.0.0.1:52910 -> 
127.0.0.1:6650] Connected to broker
   2019-10-31 12:45:04.878 INFO  BatchMessageContainer:43 | { BatchContainer 
[size = 0] [batchSizeInBytes_ = 0] [maxAllowedMessageBatchSizeInBytes_ = 
131072] [maxAllowedNumMessagesInBatch_ = 1000] [topicName = 
persistent://public/default/pulsar-log] [producerName_ = ] [batchSizeInBytes_ = 
0] [numberOfBatchesSent = 0] [averageBatchSize = 0]} BatchMessageContainer 
constructed
   2019-10-31 12:45:04.878 INFO  HandlerBase:52 | 
[persistent://public/default/pulsar-log, ] Getting connection from pool
   2019-10-31 12:45:04.887 INFO  ConnectionPool:72 | Created connection for 
pulsar://ldex-pulsar.int.example.net:6650
   2019-10-31 12:45:04.887 INFO  ClientConnection:326 | [127.0.0.1:52912 -> 
127.0.0.1:6650] Connected to broker through proxy. Logical broker: 
pulsar://ldex-pulsar.int.example.net:6650
   2019-10-31 12:45:04.890 INFO  ProducerImpl:155 | 
[persistent://public/default/pulsar-log, ] Created producer on broker 
[127.0.0.1:52912 -> 127.0.0.1:6650]
   2019-10-31 12:45:04.900 INFO  Client:88 | Subscribing on Topic :my-topic
   2019-10-31 12:45:04.902 INFO  HandlerBase:52 | 
[persistent://public/default/my-topic, public/default/womble2, 0] Getting 
connection from pool
   2019-10-31 12:45:04.914 INFO  ConsumerImpl:170 | 
[persistent://public/default/my-topic, public/default/womble2, 0] Created 
consumer on broker [127.0.0.1:52912 -> 127.0.0.1:6650]
   ^C12:45:12.279 [Thread-1] INFO  org.apache.pulsar.functions.LocalRunner - 
Shutting down the localrun runtimeSpawner ...
   ```
   
   Notice the 30 second delay as it was trying to report the exception.
   
   Running localrun under strace, finally I can see the backtrace:
   
   ```
   $ strace -f -s128 apache-pulsar-2.4.1/bin/pulsar-admin functions localrun 
--name womble2 --inputs my-topic --py /home/ubuntu/func1.py --classname 
FirstFunction --log-topic pulsar-log 2>/tmp/strace.out
   ...
   
   ...
   [pid 17173] read(10, "#!/usr/bin/env python3\n#\n# Licensed to the Apache 
Software Foundation (ASF) under one\n# or more contributor license agreements. 
"..., 40
   96) = 4096
   [pid 17173] close(10)                   = 0
   [pid 17173] lseek(9, 0, SEEK_SET)       = 0
   [pid 17173] lseek(9, 0, SEEK_CUR)       = 0
   [pid 17173] read(9, "#!/usr/bin/env python3\n#\n# Licensed to the Apache 
Software Foundation (ASF) under one\n# or more contributor license agreements. 
"..., 819
   2) = 8192
   [pid 17173] read(9, "Function Module %s\" % 
self.instance_config.function_details.className)\n      raise NameError(\"Could 
not import User Function Mod"..., 819
   2) = 8192
   [pid 17173] close(9)                    = 0
   [pid 17173] getpid()                    = 17173
   [pid 17173] lseek(3, 0, SEEK_END)       = 16934
   [pid 17173] lseek(3, 0, SEEK_CUR)       = 16934
   [pid 17173] write(3, "[2019-10-31 12:48:33 +0000] [ERROR] log.py: raise 
NameError(\"Could not import User Function Module %s\" % 
self.instance_config.fu"..., 154
   ) = 154
   [pid 17173] getpid()                    = 17173
   [pid 17173] lseek(3, 0, SEEK_END)       = 17088
   [pid 17173] lseek(3, 0, SEEK_CUR)       = 17088
   [pid 17173] write(3, "[2019-10-31 12:48:33 +0000] [ERROR] log.py: 
NameError\n", 54) = 54
   [pid 17173] getpid()                    = 17173
   [pid 17173] lseek(3, 0, SEEK_END)       = 17142
   [pid 17173] lseek(3, 0, SEEK_CUR)       = 17142
   [pid 17173] write(3, "[2019-10-31 12:48:33 +0000] [ERROR] log.py: :\n", 46) 
= 46
   [pid 17173] getpid()                    = 17173
   [pid 17173] lseek(3, 0, SEEK_END)       = 17188
   [pid 17173] lseek(3, 0, SEEK_CUR)       = 17188
   [pid 17173] write(3, "[2019-10-31 12:48:33 +0000] [ERROR] log.py: Could not 
import User Function Module FirstFunction\n", 96) = 96
   [pid 17173] close(3)                    = 0
   
   ...
   
   [pid 17174] read(3, 
"\312\376\272\276\0\0\0004\0[\7\0*\n\0+\0,\n\0\1\0-\n\0\24\0.\7\0/\n\0\5\0000\t\0\23\0001\n\0002\0003\n\0004\0005\n\0002\0006\t\0\23\0007\n\0
   
\23\0008\n\0002\0009\t\0:\0;\7\0<\n\0\17\0=\7\0>\n\0\24\0009\7\0?\7\0B\1\0\tcloseLock\1\0\22Ljava/lang/Obje"...,
 1711) = 1711
   [pid 17158] write(1, "12:48:33.287 [main] INFO  
org.apache.pulsar.functions.LocalRunner - RuntimeSpawner quit because of\n", 99 
<unfinished ...>
   ```
   
   **Expected behavior**
   I would expect the stderr which my function generates before it dies, to be 
captured and displayed in logs somewhere.
   
   **Screenshots**
   See above
   
   **Desktop (please complete the following information):**
    - Ubuntu 18.04 (running in lxd container)
   
   **Additional context**
   The fact that my function doesn't start could be a whole bunch of reasons - 
`log.py` suggests it's because I'm trying to use log-topic - but that's a 
separate issue that I need to debug.
   
   The issue here is that the message which might help has not been kept.
   
   I can see there is other information which is being logged but lost.  That 
error comes from `python_instance.py` which in turn calls `import_class()` from 
`util.py`
   
   ```
   def import_class(from_path, full_class_name):
     from_path = str(from_path)
     full_class_name = str(full_class_name)
     try:
       return import_class_from_path(from_path, full_class_name)
     except Exception as e:
       our_dir = 
os.path.dirname(os.path.abspath(inspect.getfile(inspect.currentframe())))
       api_dir = os.path.join(our_dir, PULSAR_API_ROOT, 
PULSAR_FUNCTIONS_API_ROOT)
       try:
         return import_class_from_path(api_dir, full_class_name)
       except Exception as e:
         Log.info("Failed to import class %s from path %s" % (full_class_name, 
from_path))
         Log.info(e, exc_info=True)
         return None
   ```
   
   When this function returns None there should be a Log.info().  Looking in 
the strace output I see the detail there:
   
   ```
   $ grep "Failed to import class" /tmp/strace.out
   [pid 17173] write(3, "[2019-10-31 12:48:33 +0000] [INFO] util.py: Failed to 
import class FirstFunction from path /home/ubuntu\n", 104) = 104
   ```
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to