[ https://issues.apache.org/jira/browse/SPARK-37122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon updated SPARK-37122: --------------------------------- Priority: Major (was: Critical) > java.lang.IllegalArgumentException Related to Prometheus > -------------------------------------------------------- > > Key: SPARK-37122 > URL: https://issues.apache.org/jira/browse/SPARK-37122 > Project: Spark > Issue Type: Bug > Components: Kubernetes > Affects Versions: 3.1.1 > Reporter: Biswa Singh > Priority: Major > > This issue is similar to > https://issues.apache.org/jira/browse/SPARK-35237?focusedCommentId=17340723&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17340723. > We receive the Following warning continuously: > > 21:00:26.277 [rpc-server-4-2] WARN o.a.s.n.s.TransportChannelHandler - > Exception in connection from > /10.198.3.179:51184java.lang.IllegalArgumentException: Too large frame: > 5135603447297303916 at > org.sparkproject.guava.base.Preconditions.checkArgument(Preconditions.java:119) > at > org.apache.spark.network.util.TransportFrameDecoder.decodeNext(TransportFrameDecoder.java:148) > at > org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:98) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) > at > io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) > at > io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) > at > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:719) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:655) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:581) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) at > io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) > at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.base/java.lang.Thread.run(Unknown Source) > > Below are other details related to prometheus and my findings. Please SCROLL > DOWN to see the details: > > {noformat} > Prometheus Scrape Configuration > =============================== > - job_name: 'kubernetes-pods' > kubernetes_sd_configs: > - role: pod > relabel_configs: > - action: labelmap > regex: __meta_kubernetes_pod_label_(.+) > - source_labels: [__meta_kubernetes_namespace] > action: replace > target_label: kubernetes_namespace > - source_labels: [__meta_kubernetes_pod_name] > action: replace > target_label: kubernetes_pod_name > - source_labels: > [__meta_kubernetes_pod_annotation_prometheus_io_scrape] > action: keep > regex: true > - source_labels: > [__meta_kubernetes_pod_annotation_prometheus_io_scheme] > action: replace > target_label: __scheme__ > regex: (https?) > - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path] > action: replace > target_label: __metrics_path__ > regex: (.+) > - source_labels: [__address__, > __meta_kubernetes_pod_prometheus_io_port] > action: replace > target_label: __address__ > regex: ([^:]+)(?::\d+)?;(\d+) > replacement: $1:$2 > tcptrack command output in spark3 pod > ====================================== > 10.198.22.240:51258 10.198.40.143:7079 CLOSED 10s 0 B/s > 10.198.22.240:51258 10.198.40.143:7079 CLOSED 10s 0 B/s > 10.198.22.240:50354 10.198.40.143:7079 CLOSED 40s 0 B/s > 10.198.22.240:33152 10.198.40.143:4040 ESTABLISHED 2s 0 B/s > 10.198.22.240:47726 10.198.40.143:8090 ESTABLISHED 9s 0 B/s > 10.198.22.240 = prometheus pod > ip10.198.40.143 = testpod ip > Issue > ====== > Though the scrape config is expected to scrape on port 8090. I see prometheus > tries to initiate scrape on ports like 7079, 7078, 4040, etc on > the spark3 pod and hence the exception in spark3 pod. But is this really a > prometheus issue or something at spark side? We don't see any such exception > in any of the other pods. All our pods including spark3 are annotated with: > annotations: > prometheus.io/port: "8090" > prometheus.io/scrape: "true" > We get the metrics and everything fine just extra warning for this > exception.{noformat} > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org