igalshilman opened a new pull request #143: URL: https://github.com/apache/flink-statefun/pull/143
### This PR adds logging and metrics for remote function invocations After applying this PR we would have the following additional metrics (per function type) - `remote-invocation-failures` count of any exception happened during the request. - `remote-invocation-failures` rate of exceptions thrown - `remote-invocation-latency` an histogram of request duration (the time takes either to a failure or a successful result) <img width="868" alt="image" src="https://user-images.githubusercontent.com/546103/92615041-fc5dca80-f2bc-11ea-8837-6506ef681d0a.png"> <img width="875" alt="image" src="https://user-images.githubusercontent.com/546103/92615129-126b8b00-f2bd-11ea-8945-50c172c254af.png"> <img width="870" alt="image" src="https://user-images.githubusercontent.com/546103/92615211-2911e200-f2bd-11ea-9f14-2973a061ca93.png"> <img width="869" alt="image" src="https://user-images.githubusercontent.com/546103/92615270-39c25800-f2bd-11ea-93e1-89237dfea03c.png"> In addition, a log message would be written with the detailed exception during retires. ``` worker_1 | 2020-09-09 14:58:00,028 WARN org.apache.flink.statefun.flink.core.httpfn.RetryingCallback [] - Retriable exception caught while trying to deliver a message: ToFunctionRequestSummary(address=Address(example, greeter, George), batchSize=1, totalSizeInBytes=142, numberOfStates=1) worker_1 | java.net.UnknownHostException: python-worker worker_1 | at java.net.InetAddress.getAllByName0(InetAddress.java:1281) ~[?:1.8.0_265] worker_1 | at java.net.InetAddress.getAllByName(InetAddress.java:1193) ~[?:1.8.0_265] worker_1 | at java.net.InetAddress.getAllByName(InetAddress.java:1127) ~[?:1.8.0_265] worker_1 | at okhttp3.Dns.lambda$static$0(Dns.java:39) ~[statefun-flink-distribution.jar:2.2-SNAPSHOT] worker_1 | at okhttp3.internal.connection.RouteSelector.resetNextInetSocketAddress(RouteSelector.java:171) ~[statefun-flink-distribution.jar:2.2-SNAPSHOT] worker_1 | at okhttp3.internal.connection.RouteSelector.nextProxy(RouteSelector.java:135) ~[statefun-flink-distribution.jar:2.2-SNAPSHOT] worker_1 | at okhttp3.internal.connection.RouteSelector.next(RouteSelector.java:84) ~[statefun-flink-distribution.jar:2.2-SNAPSHOT] worker_1 | at okhttp3.internal.connection.ExchangeFinder.findConnection(ExchangeFinder.java:187) ~[statefun-flink-distribution.jar:2.2-SNAPSHOT] worker_1 | at okhttp3.internal.connection.ExchangeFinder.findHealthyConnection(ExchangeFinder.java:108) ~[statefun-flink-distribution.jar:2.2-SNAPSHOT] worker_1 | at okhttp3.internal.connection.ExchangeFinder.find(ExchangeFinder.java:88) ~[statefun-flink-distribution.jar:2.2-SNAPSHOT] worker_1 | at okhttp3.internal.connection.Transmitter.newExchange(Transmitter.java:169) ~[statefun-flink-distribution.jar:2.2-SNAPSHOT] worker_1 | at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:41) ~[statefun-flink-distribution.jar:2.2-SNAPSHOT] worker_1 | at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[statefun-flink-distribution.jar:2.2-SNAPSHOT] worker_1 | at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117) ~[statefun-flink-distribution.jar:2.2-SNAPSHOT] worker_1 | at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:94) ~[statefun-flink-distribution.jar:2.2-SNAPSHOT] worker_1 | at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[statefun-flink-distribution.jar:2.2-SNAPSHOT] worker_1 | at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117) ~[statefun-flink-distribution.jar:2.2-SNAPSHOT] worker_1 | at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93) ~[statefun-flink-distribution.jar:2.2-SNAPSHOT] worker_1 | at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[statefun-flink-distribution.jar:2.2-SNAPSHOT] worker_1 | at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:88) ~[statefun-flink-distribution.jar:2.2-SNAPSHOT] worker_1 | at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[statefun-flink-distribution.jar:2.2-SNAPSHOT] worker_1 | at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117) ~[statefun-flink-distribution.jar:2.2-SNAPSHOT] worker_1 | at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:229) ~[statefun-flink-distribution.jar:2.2-SNAPSHOT] worker_1 | at okhttp3.RealCall$AsyncCall.execute(RealCall.java:172) [statefun-flink-distribution.jar:2.2-SNAPSHOT] worker_1 | at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) [statefun-flink-distribution.jar:2.2-SNAPSHOT] worker_1 | at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_265] worker_1 | at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_265] worker_1 | at java.lang.Thread.run(Thread.java:748) [?:1.8.0_265] ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
