Biswa Singh created SPARK-37122: ----------------------------------- Summary: java.lang.IllegalArgumentException Related to Prometheus Key: SPARK-37122 URL: https://issues.apache.org/jira/browse/SPARK-37122 Project: Spark Issue Type: Bug Components: Kubernetes Affects Versions: 3.1.1, 3.0.2 Reporter: Biswa Singh
This issue is similar to https://issues.apache.org/jira/browse/SPARK-35237?focusedCommentId=17340723&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17340723. We receive the Following warning: 21:00:26.277 [rpc-server-4-2] WARN o.a.s.n.s.TransportChannelHandler - Exception in connection from /10.198.3.179:51184java.lang.IllegalArgumentException: Too large frame: 5135603447297303916 at org.sparkproject.guava.base.Preconditions.checkArgument(Preconditions.java:119) at org.apache.spark.network.util.TransportFrameDecoder.decodeNext(TransportFrameDecoder.java:148) at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:98) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:719) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:655) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:581) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.base/java.lang.Thread.run(Unknown Source) Below are other details related to prometheus. {noformat} Prometheus Scrape Configuration =============================== - job_name: 'kubernetes-pods' kubernetes_sd_configs: - role: pod relabel_configs: - action: labelmap regex: __meta_kubernetes_pod_label_(.+) - source_labels: [__meta_kubernetes_namespace] action: replace target_label: kubernetes_namespace - source_labels: [__meta_kubernetes_pod_name] action: replace target_label: kubernetes_pod_name - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape] action: keep regex: true - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scheme] action: replace target_label: __scheme__ regex: (https?) - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path] action: replace target_label: __metrics_path__ regex: (.+) - source_labels: [__address__, __meta_kubernetes_pod_prometheus_io_port] action: replace target_label: __address__ regex: ([^:]+)(?::\d+)?;(\d+) replacement: $1:$2 tcptrack command output in spark3 pod ====================================== 10.198.22.240:51258 10.198.40.143:7079 CLOSED 10s 0 B/s 10.198.22.240:51258 10.198.40.143:7079 CLOSED 10s 0 B/s 10.198.22.240:50354 10.198.40.143:7079 CLOSED 40s 0 B/s 10.198.22.240:33152 10.198.40.143:4040 ESTABLISHED 2s 0 B/s 10.198.22.240:47726 10.198.40.143:8090 ESTABLISHED 9s 0 B/s 10.198.22.240 = prometheus pod ip10.198.40.143 = testpod ip Issue ====== Though the scrape config is expected to scrape on port 8090. I see prometheus tries to initiate scrape on ports like 7079, 7078, 4040, etc on the spark3 pod and hence the exception in spark3 pod. But is this really a prometheus issue or something at spark side? We don't see any such exception in any of the other pods. All our pods including spark3 are annotated with: annotations: prometheus.io/port: "8090" prometheus.io/scrape: "true" We get the metrics and everything fine just extra warning for this exception.{noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org