[ 
https://issues.apache.org/jira/browse/BEAM-5633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16642505#comment-16642505
 ] 

Micah Wylde commented on BEAM-5633:
-----------------------------------

At a high-level, the issue is the client is sending the server too many illegal 
pings, and the server eventually closes the connection. As the client doesn't 
have any retry logic it never recovers from this.

Ideally, there'd be two levels of fixes:
 # Update the server or client gRPC configuration to prevent the client from 
sending bad pings or to prevent the server from killing the connection in this 
situation (see [here|https://github.com/grpc/grpc/blob/master/doc/keepalive.md] 
for more documentation of this behavior)
 # Make the logging client more robust, reconnecting when the connection is lost

I haven't been able to find a configuration that solves the first part, so that 
may need more investigation. Regardless, I think it's worthwhile having 
reconnection logic even if that issue is fixed.

> Python SDK harness logging client failure
> -----------------------------------------
>
>                 Key: BEAM-5633
>                 URL: https://issues.apache.org/jira/browse/BEAM-5633
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-py-harness
>            Reporter: Thomas Weise
>            Assignee: Micah Wylde
>            Priority: Major
>              Labels: portability-flink
>          Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> After running a test with synthetic source for a few minutes, the logging 
> client fails and all subsequent log output is not forwarded to the runner.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to