[ 
https://issues.apache.org/jira/browse/SPARK-37122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-37122:
---------------------------------
    Priority: Major  (was: Critical)

> java.lang.IllegalArgumentException Related to Prometheus
> --------------------------------------------------------
>
>                 Key: SPARK-37122
>                 URL: https://issues.apache.org/jira/browse/SPARK-37122
>             Project: Spark
>          Issue Type: Bug
>          Components: Kubernetes
>    Affects Versions: 3.1.1
>            Reporter: Biswa Singh
>            Priority: Major
>
> This issue is similar to 
> https://issues.apache.org/jira/browse/SPARK-35237?focusedCommentId=17340723&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17340723.
>  We receive the Following warning continuously:
>  
> 21:00:26.277 [rpc-server-4-2] WARN  o.a.s.n.s.TransportChannelHandler - 
> Exception in connection from 
> /10.198.3.179:51184java.lang.IllegalArgumentException: Too large frame: 
> 5135603447297303916 at 
> org.sparkproject.guava.base.Preconditions.checkArgument(Preconditions.java:119)
>  at 
> org.apache.spark.network.util.TransportFrameDecoder.decodeNext(TransportFrameDecoder.java:148)
>  at 
> org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:98)
>  at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>  at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>  at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
>  at 
> io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
>  at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>  at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>  at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
>  at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
>  at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:719) 
> at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:655)
>  at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:581) 
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) at 
> io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
>  at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) 
> at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>  at java.base/java.lang.Thread.run(Unknown Source)
>  
> Below are other details related to prometheus and my findings. Please SCROLL 
> DOWN to see the details:
>  
> {noformat}
> Prometheus Scrape Configuration
> ===============================
> - job_name: 'kubernetes-pods'
>       kubernetes_sd_configs:
>         - role: pod
>       relabel_configs:
>         - action: labelmap
>           regex: __meta_kubernetes_pod_label_(.+)
>         - source_labels: [__meta_kubernetes_namespace]
>           action: replace
>           target_label: kubernetes_namespace
>         - source_labels: [__meta_kubernetes_pod_name]
>           action: replace
>           target_label: kubernetes_pod_name
>         - source_labels: 
> [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
>           action: keep
>           regex: true
>         - source_labels: 
> [__meta_kubernetes_pod_annotation_prometheus_io_scheme]
>           action: replace
>           target_label: __scheme__
>           regex: (https?)
>         - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
>           action: replace
>           target_label: __metrics_path__
>           regex: (.+)
>         - source_labels: [__address__, 
> __meta_kubernetes_pod_prometheus_io_port]
>           action: replace
>           target_label: __address__
>           regex: ([^:]+)(?::\d+)?;(\d+)
>           replacement: $1:$2
> tcptrack command output in spark3 pod
> ======================================
> 10.198.22.240:51258  10.198.40.143:7079  CLOSED 10s 0 B/s
> 10.198.22.240:51258  10.198.40.143:7079  CLOSED 10s 0 B/s
> 10.198.22.240:50354  10.198.40.143:7079  CLOSED 40s 0 B/s
> 10.198.22.240:33152  10.198.40.143:4040  ESTABLISHED 2s 0 B/s
> 10.198.22.240:47726  10.198.40.143:8090  ESTABLISHED 9s 0 B/s
> 10.198.22.240 = prometheus pod 
> ip10.198.40.143 = testpod ip 
> Issue
> ======
> Though the scrape config is expected to scrape on port 8090. I see prometheus 
> tries to initiate scrape on ports like 7079, 7078, 4040, etc on
> the spark3 pod and hence the exception in spark3 pod. But is this really a 
> prometheus issue or something at spark side? We don't see any such exception 
> in any of the other pods. All our pods including spark3 are annotated with:
> annotations:
>    prometheus.io/port: "8090"
>    prometheus.io/scrape: "true"
> We get the metrics and everything fine just extra warning for this 
> exception.{noformat}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to