Biswa Singh created SPARK-37122:
-----------------------------------

             Summary: java.lang.IllegalArgumentException Related to Prometheus
                 Key: SPARK-37122
                 URL: https://issues.apache.org/jira/browse/SPARK-37122
             Project: Spark
          Issue Type: Bug
          Components: Kubernetes
    Affects Versions: 3.1.1, 3.0.2
            Reporter: Biswa Singh


This issue is similar to 
https://issues.apache.org/jira/browse/SPARK-35237?focusedCommentId=17340723&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17340723.
 We receive the Following warning:

 

 

21:00:26.277 [rpc-server-4-2] WARN  o.a.s.n.s.TransportChannelHandler - 
Exception in connection from 
/10.198.3.179:51184java.lang.IllegalArgumentException: Too large frame: 
5135603447297303916 at 
org.sparkproject.guava.base.Preconditions.checkArgument(Preconditions.java:119) 
at 
org.apache.spark.network.util.TransportFrameDecoder.decodeNext(TransportFrameDecoder.java:148)
 at 
org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:98)
 at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
 at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
 at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
 at 
io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
 at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
 at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
 at 
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
 at 
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
 at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:719) 
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:655)
 at 
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:581) at 
io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) at 
io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
 at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) 
at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
 at java.base/java.lang.Thread.run(Unknown Source)

 

Below are other details related to prometheus.

 
{noformat}

Prometheus Scrape Configuration
===============================
- job_name: 'kubernetes-pods'
      kubernetes_sd_configs:
        - role: pod
      relabel_configs:
        - action: labelmap
          regex: __meta_kubernetes_pod_label_(.+)
        - source_labels: [__meta_kubernetes_namespace]
          action: replace
          target_label: kubernetes_namespace
        - source_labels: [__meta_kubernetes_pod_name]
          action: replace
          target_label: kubernetes_pod_name
        - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
          action: keep
          regex: true
        - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scheme]
          action: replace
          target_label: __scheme__
          regex: (https?)
        - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
          action: replace
          target_label: __metrics_path__
          regex: (.+)
        - source_labels: [__address__, __meta_kubernetes_pod_prometheus_io_port]
          action: replace
          target_label: __address__
          regex: ([^:]+)(?::\d+)?;(\d+)
          replacement: $1:$2

tcptrack command output in spark3 pod
======================================
10.198.22.240:51258  10.198.40.143:7079  CLOSED 10s 0 B/s
10.198.22.240:51258  10.198.40.143:7079  CLOSED 10s 0 B/s
10.198.22.240:50354  10.198.40.143:7079  CLOSED 40s 0 B/s
10.198.22.240:33152  10.198.40.143:4040  ESTABLISHED 2s 0 B/s
10.198.22.240:47726  10.198.40.143:8090  ESTABLISHED 9s 0 B/s

10.198.22.240 = prometheus pod 

ip10.198.40.143 = testpod ip 

Issue
======
Though the scrape config is expected to scrape on port 8090. I see prometheus 
tries to initiate scrape on ports like 7079, 7078, 4040, etc on
the spark3 pod and hence the exception in spark3 pod. But is this really a 
prometheus issue or something at spark side? We don't see any such exception in 
any of the other pods. All our pods including spark3 are annotated with:

annotations:
   prometheus.io/port: "8090"
   prometheus.io/scrape: "true"

We get the metrics and everything fine just extra warning for this 
exception.{noformat}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to