[
https://issues.apache.org/jira/browse/SPARK-49872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Anthony Sgro updated SPARK-49872:
---------------------------------
Description:
There is an issue with the Spark History UI with large amounts of event logs.
The root of this problem is the breaking change in jackson that (in the name of
"safety") introduced some JSON size limits, see:
[https://github.com/FasterXML/jackson-core/issues/1014]
Looks like {{JSONOptions}} in Spark already [support configuring this
limit|https://github.com/apache/spark/blob/c2dbb6d04bc9c781fb4a7673e5acf2c67b99c203/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala#L55-L58],
but there seems to be no way to set it globally.
Spark should be able to handle strings of arbitrary length. I have tried
configuring rolling event logs, pruning event logs, etc. but this issue is not
fixed or causes so much data loss that the spark history ui is completely
useless.
Perhaps a solution could be to add a config like:
{code:java}
spark.history.server.jsonStreamReadConstraints.maxStringLength=<new_value>
{code}
This has a workaround for reading JSON during your application:
{code:java}
spark.read.option("maxStringLen", 100000000).json(path) {code}
But this is not an option for accessing the Spark History UI. Here is the full
stack trace
{code:java}
HTTP ERROR 500
com.fasterxml.jackson.core.exc.StreamConstraintsException: String length
(20054016) exceeds the maximum length (20000000)
URI:/history/application_1728009195451_0002/1/jobs/
STATUS:500
MESSAGE:com.fasterxml.jackson.core.exc.StreamConstraintsException: String
length (20054016) exceeds the maximum length (20000000)
SERVLET:org.apache.spark.deploy.history.HistoryServer$$anon$1-582a764a
CAUSED BY:com.fasterxml.jackson.core.exc.StreamConstraintsException: String
length (20054016) exceeds the maximum length (20000000)
com.fasterxml.jackson.core.exc.StreamConstraintsException: String length
(20054016) exceeds the maximum length (20000000)
at
com.fasterxml.jackson.core.StreamReadConstraints.validateStringLength(StreamReadConstraints.java:324)
at
com.fasterxml.jackson.core.util.ReadConstrainedTextBuffer.validateStringLength(ReadConstrainedTextBuffer.java:27)
at
com.fasterxml.jackson.core.util.TextBuffer.finishCurrentSegment(TextBuffer.java:939)
at
com.fasterxml.jackson.core.json.ReaderBasedJsonParser._finishString2(ReaderBasedJsonParser.java:2240)
at
com.fasterxml.jackson.core.json.ReaderBasedJsonParser._finishString(ReaderBasedJsonParser.java:2206)
at
com.fasterxml.jackson.core.json.ReaderBasedJsonParser.getText(ReaderBasedJsonParser.java:323)
at
com.fasterxml.jackson.databind.deser.std.BaseNodeDeserializer._deserializeContainerNoRecursion(JsonNodeDeserializer.java:572)
at
com.fasterxml.jackson.databind.deser.std.JsonNodeDeserializer.deserialize(JsonNodeDeserializer.java:100)
at
com.fasterxml.jackson.databind.deser.std.JsonNodeDeserializer.deserialize(JsonNodeDeserializer.java:25)
at
com.fasterxml.jackson.databind.deser.DefaultDeserializationContext.readRootValue(DefaultDeserializationContext.java:323)
at
com.fasterxml.jackson.databind.ObjectMapper._readTreeAndClose(ObjectMapper.java:4867)
at
com.fasterxml.jackson.databind.ObjectMapper.readTree(ObjectMapper.java:3219)
at
org.apache.spark.util.JsonProtocol$.sparkEventFromJson(JsonProtocol.scala:927)
at
org.apache.spark.scheduler.ReplayListenerBus.replay(ReplayListenerBus.scala:88)
at
org.apache.spark.scheduler.ReplayListenerBus.replay(ReplayListenerBus.scala:59)
at
org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$parseAppEventLogs$3(FsHistoryProvider.scala:1143)
at
org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$parseAppEventLogs$3$adapted(FsHistoryProvider.scala:1141)
at
org.apache.spark.util.SparkErrorUtils.tryWithResource(SparkErrorUtils.scala:48)
at
org.apache.spark.util.SparkErrorUtils.tryWithResource$(SparkErrorUtils.scala:46)
at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:95)
at
org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$parseAppEventLogs$1(FsHistoryProvider.scala:1141)
at
org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$parseAppEventLogs$1$adapted(FsHistoryProvider.scala:1139)
at scala.collection.immutable.List.foreach(List.scala:431)
at
org.apache.spark.deploy.history.FsHistoryProvider.parseAppEventLogs(FsHistoryProvider.scala:1139)
at
org.apache.spark.deploy.history.FsHistoryProvider.rebuildAppStore(FsHistoryProvider.scala:1120)
at
org.apache.spark.deploy.history.FsHistoryProvider.createInMemoryStore(FsHistoryProvider.scala:1358)
at
org.apache.spark.deploy.history.FsHistoryProvider.getAppUI(FsHistoryProvider.scala:347)
at
org.apache.spark.deploy.history.HistoryServer.getAppUI(HistoryServer.scala:199)
at
org.apache.spark.deploy.history.ApplicationCache.$anonfun$loadApplicationEntry$2(ApplicationCache.scala:163)
at
org.apache.spark.deploy.history.ApplicationCache.time(ApplicationCache.scala:134)
at
org.apache.spark.deploy.history.ApplicationCache.org$apache$spark$deploy$history$ApplicationCache$$loadApplicationEntry(ApplicationCache.scala:161)
at
org.apache.spark.deploy.history.ApplicationCache$$anon$1.load(ApplicationCache.scala:55)
at
org.apache.spark.deploy.history.ApplicationCache$$anon$1.load(ApplicationCache.scala:51)
at
org.sparkproject.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
at
org.sparkproject.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
at
org.sparkproject.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
at
org.sparkproject.guava.cache.LocalCache$Segment.get(LocalCache.java:2257)
at org.sparkproject.guava.cache.LocalCache.get(LocalCache.java:4000)
at
org.sparkproject.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004)
at
org.sparkproject.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
at
org.apache.spark.deploy.history.ApplicationCache.get(ApplicationCache.scala:88)
at
org.apache.spark.deploy.history.ApplicationCache.withSparkUI(ApplicationCache.scala:100)
at
org.apache.spark.deploy.history.HistoryServer.org$apache$spark$deploy$history$HistoryServer$$loadAppUi(HistoryServer.scala:256)
at
org.apache.spark.deploy.history.HistoryServer$$anon$1.doGet(HistoryServer.scala:104)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:503)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:590)
at
org.sparkproject.jetty.servlet.ServletHolder.handle(ServletHolder.java:799)
at
org.sparkproject.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1656)
at
org.apache.spark.ui.HttpSecurityFilter.doFilter(HttpSecurityFilter.scala:95)
at
org.sparkproject.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
at
org.sparkproject.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626)
at
org.sparkproject.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:552)
at
org.sparkproject.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
at
org.sparkproject.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1440)
at
org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
at
org.sparkproject.jetty.servlet.ServletHandler.doScope(ServletHandler.java:505)
at
org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
at
org.sparkproject.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1355)
at
org.sparkproject.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at
org.sparkproject.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:772)
at
org.sparkproject.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:234)
at
org.sparkproject.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
at org.apache.spark.ui.ProxyRedirectHandler.handle(JettyUtils.scala:582)
at
org.sparkproject.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
at org.sparkproject.jetty.server.Server.handle(Server.java:516)
at
org.sparkproject.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:487)
at
org.sparkproject.jetty.server.HttpChannel.dispatch(HttpChannel.java:732)
at
org.sparkproject.jetty.server.HttpChannel.handle(HttpChannel.java:479)
at
org.sparkproject.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)
at
org.sparkproject.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
at
org.sparkproject.jetty.io.FillInterest.fillable(FillInterest.java:105)
at
org.sparkproject.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
at
org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338)
at
org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315)
at
org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)
at
org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)
at
org.sparkproject.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:409)
at
org.sparkproject.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)
at
org.sparkproject.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)
at java.lang.Thread.run(Thread.java:750){code}
was:
There is an issue with the Spark History UI with large amounts of event logs.
{code:java}
HTTP ERROR 500
com.fasterxml.jackson.core.exc.StreamConstraintsException: String length
(20054016) exceeds the maximum length (20000000)
URI:/history/application_1728009195451_0002/1/jobs/
STATUS:500
MESSAGE:com.fasterxml.jackson.core.exc.StreamConstraintsException: String
length (20054016) exceeds the maximum length (20000000)
SERVLET:org.apache.spark.deploy.history.HistoryServer$$anon$1-582a764a
CAUSED BY:com.fasterxml.jackson.core.exc.StreamConstraintsException: String
length (20054016) exceeds the maximum length (20000000) {code}
The root of this problem is the breaking change in jackson that (in the name of
"safety") introduced some JSON size limits, see:
[https://github.com/FasterXML/jackson-core/issues/1014]
Looks like {{JSONOptions}} in Spark already [support configuring this
limit|https://github.com/apache/spark/blob/c2dbb6d04bc9c781fb4a7673e5acf2c67b99c203/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala#L55-L58],
but there seems to be no way to set it globally.
Spark should be able to handle strings of arbitrary length. I have tried
configuring rolling event logs, pruning event logs, etc. but this issue is not
fixed or causes so much data loss that the spark history ui is completely
useless.
Perhaps a solution could be to add a config like:
{code:java}
spark.history.server.jsonStreamReadConstraints.maxStringLength=<new_value>
{code}
This has a workaround for reading JSON during your application:
{code:java}
spark.read.option("maxStringLen", 100000000).json(path) {code}
But this is not an option for accessing the Spark History UI
> Spark History UI -- StreamConstraintsException: String length (20054016)
> exceeds the maximum length (20000000)
> --------------------------------------------------------------------------------------------------------------
>
> Key: SPARK-49872
> URL: https://issues.apache.org/jira/browse/SPARK-49872
> Project: Spark
> Issue Type: Bug
> Components: UI
> Affects Versions: 3.5.3
> Reporter: Anthony Sgro
> Priority: Major
>
> There is an issue with the Spark History UI with large amounts of event logs.
> The root of this problem is the breaking change in jackson that (in the name
> of "safety") introduced some JSON size limits, see:
> [https://github.com/FasterXML/jackson-core/issues/1014]
> Looks like {{JSONOptions}} in Spark already [support configuring this
> limit|https://github.com/apache/spark/blob/c2dbb6d04bc9c781fb4a7673e5acf2c67b99c203/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala#L55-L58],
> but there seems to be no way to set it globally.
> Spark should be able to handle strings of arbitrary length. I have tried
> configuring rolling event logs, pruning event logs, etc. but this issue is
> not fixed or causes so much data loss that the spark history ui is completely
> useless.
> Perhaps a solution could be to add a config like:
> {code:java}
> spark.history.server.jsonStreamReadConstraints.maxStringLength=<new_value>
> {code}
> This has a workaround for reading JSON during your application:
> {code:java}
> spark.read.option("maxStringLen", 100000000).json(path) {code}
> But this is not an option for accessing the Spark History UI. Here is the
> full stack trace
> {code:java}
> HTTP ERROR 500
> com.fasterxml.jackson.core.exc.StreamConstraintsException: String length
> (20054016) exceeds the maximum length (20000000)
> URI:/history/application_1728009195451_0002/1/jobs/
> STATUS:500
> MESSAGE:com.fasterxml.jackson.core.exc.StreamConstraintsException: String
> length (20054016) exceeds the maximum length (20000000)
> SERVLET:org.apache.spark.deploy.history.HistoryServer$$anon$1-582a764a
> CAUSED BY:com.fasterxml.jackson.core.exc.StreamConstraintsException: String
> length (20054016) exceeds the maximum length (20000000)
> com.fasterxml.jackson.core.exc.StreamConstraintsException: String length
> (20054016) exceeds the maximum length (20000000)
> at
> com.fasterxml.jackson.core.StreamReadConstraints.validateStringLength(StreamReadConstraints.java:324)
> at
> com.fasterxml.jackson.core.util.ReadConstrainedTextBuffer.validateStringLength(ReadConstrainedTextBuffer.java:27)
> at
> com.fasterxml.jackson.core.util.TextBuffer.finishCurrentSegment(TextBuffer.java:939)
> at
> com.fasterxml.jackson.core.json.ReaderBasedJsonParser._finishString2(ReaderBasedJsonParser.java:2240)
> at
> com.fasterxml.jackson.core.json.ReaderBasedJsonParser._finishString(ReaderBasedJsonParser.java:2206)
> at
> com.fasterxml.jackson.core.json.ReaderBasedJsonParser.getText(ReaderBasedJsonParser.java:323)
> at
> com.fasterxml.jackson.databind.deser.std.BaseNodeDeserializer._deserializeContainerNoRecursion(JsonNodeDeserializer.java:572)
> at
> com.fasterxml.jackson.databind.deser.std.JsonNodeDeserializer.deserialize(JsonNodeDeserializer.java:100)
> at
> com.fasterxml.jackson.databind.deser.std.JsonNodeDeserializer.deserialize(JsonNodeDeserializer.java:25)
> at
> com.fasterxml.jackson.databind.deser.DefaultDeserializationContext.readRootValue(DefaultDeserializationContext.java:323)
> at
> com.fasterxml.jackson.databind.ObjectMapper._readTreeAndClose(ObjectMapper.java:4867)
> at
> com.fasterxml.jackson.databind.ObjectMapper.readTree(ObjectMapper.java:3219)
> at
> org.apache.spark.util.JsonProtocol$.sparkEventFromJson(JsonProtocol.scala:927)
> at
> org.apache.spark.scheduler.ReplayListenerBus.replay(ReplayListenerBus.scala:88)
> at
> org.apache.spark.scheduler.ReplayListenerBus.replay(ReplayListenerBus.scala:59)
> at
> org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$parseAppEventLogs$3(FsHistoryProvider.scala:1143)
> at
> org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$parseAppEventLogs$3$adapted(FsHistoryProvider.scala:1141)
> at
> org.apache.spark.util.SparkErrorUtils.tryWithResource(SparkErrorUtils.scala:48)
> at
> org.apache.spark.util.SparkErrorUtils.tryWithResource$(SparkErrorUtils.scala:46)
> at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:95)
> at
> org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$parseAppEventLogs$1(FsHistoryProvider.scala:1141)
> at
> org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$parseAppEventLogs$1$adapted(FsHistoryProvider.scala:1139)
> at scala.collection.immutable.List.foreach(List.scala:431)
> at
> org.apache.spark.deploy.history.FsHistoryProvider.parseAppEventLogs(FsHistoryProvider.scala:1139)
> at
> org.apache.spark.deploy.history.FsHistoryProvider.rebuildAppStore(FsHistoryProvider.scala:1120)
> at
> org.apache.spark.deploy.history.FsHistoryProvider.createInMemoryStore(FsHistoryProvider.scala:1358)
> at
> org.apache.spark.deploy.history.FsHistoryProvider.getAppUI(FsHistoryProvider.scala:347)
> at
> org.apache.spark.deploy.history.HistoryServer.getAppUI(HistoryServer.scala:199)
> at
> org.apache.spark.deploy.history.ApplicationCache.$anonfun$loadApplicationEntry$2(ApplicationCache.scala:163)
> at
> org.apache.spark.deploy.history.ApplicationCache.time(ApplicationCache.scala:134)
> at
> org.apache.spark.deploy.history.ApplicationCache.org$apache$spark$deploy$history$ApplicationCache$$loadApplicationEntry(ApplicationCache.scala:161)
> at
> org.apache.spark.deploy.history.ApplicationCache$$anon$1.load(ApplicationCache.scala:55)
> at
> org.apache.spark.deploy.history.ApplicationCache$$anon$1.load(ApplicationCache.scala:51)
> at
> org.sparkproject.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
> at
> org.sparkproject.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
> at
> org.sparkproject.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
> at
> org.sparkproject.guava.cache.LocalCache$Segment.get(LocalCache.java:2257)
> at org.sparkproject.guava.cache.LocalCache.get(LocalCache.java:4000)
> at
> org.sparkproject.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004)
> at
> org.sparkproject.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
> at
> org.apache.spark.deploy.history.ApplicationCache.get(ApplicationCache.scala:88)
> at
> org.apache.spark.deploy.history.ApplicationCache.withSparkUI(ApplicationCache.scala:100)
> at
> org.apache.spark.deploy.history.HistoryServer.org$apache$spark$deploy$history$HistoryServer$$loadAppUi(HistoryServer.scala:256)
> at
> org.apache.spark.deploy.history.HistoryServer$$anon$1.doGet(HistoryServer.scala:104)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:503)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:590)
> at
> org.sparkproject.jetty.servlet.ServletHolder.handle(ServletHolder.java:799)
> at
> org.sparkproject.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1656)
> at
> org.apache.spark.ui.HttpSecurityFilter.doFilter(HttpSecurityFilter.scala:95)
> at
> org.sparkproject.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
> at
> org.sparkproject.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626)
> at
> org.sparkproject.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:552)
> at
> org.sparkproject.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
> at
> org.sparkproject.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1440)
> at
> org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
> at
> org.sparkproject.jetty.servlet.ServletHandler.doScope(ServletHandler.java:505)
> at
> org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
> at
> org.sparkproject.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1355)
> at
> org.sparkproject.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
> at
> org.sparkproject.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:772)
> at
> org.sparkproject.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:234)
> at
> org.sparkproject.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
> at org.apache.spark.ui.ProxyRedirectHandler.handle(JettyUtils.scala:582)
> at
> org.sparkproject.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
> at org.sparkproject.jetty.server.Server.handle(Server.java:516)
> at
> org.sparkproject.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:487)
> at
> org.sparkproject.jetty.server.HttpChannel.dispatch(HttpChannel.java:732)
> at
> org.sparkproject.jetty.server.HttpChannel.handle(HttpChannel.java:479)
> at
> org.sparkproject.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)
> at
> org.sparkproject.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
> at
> org.sparkproject.jetty.io.FillInterest.fillable(FillInterest.java:105)
> at
> org.sparkproject.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
> at
> org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338)
> at
> org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315)
> at
> org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)
> at
> org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)
> at
> org.sparkproject.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:409)
> at
> org.sparkproject.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)
> at
> org.sparkproject.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)
> at java.lang.Thread.run(Thread.java:750){code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]