Josh Rosen created SPARK-39489:
----------------------------------
Summary: Improve EventLoggingListener and ReplayListener
performance by replacing Json4S ASTs with Jackson trees
Key: SPARK-39489
URL: https://issues.apache.org/jira/browse/SPARK-39489
Project: Spark
Issue Type: Improvement
Components: Spark Core
Affects Versions: 3.0.0
Reporter: Josh Rosen
Assignee: Josh Rosen
Spark's event log JsonProtocol currently uses Json4s ASTs to generate and parse
JSON. Performance overheads from Json4s account for a significant proportion of
all time spent in JsonProtocol. If we replace Json4s usage with direct usage of
Jackson APIs then we can significantly improve performance (~2x improvement for
writing and reading in my own local microbenchmarks).
This performance improvement translates to faster history server load times and
reduced load on the Spark driver (and reduced likelihood of dropping events
because the listener cannot keep up, therefore reducing the likelihood of
inconsistent Spark UIs).
Reducing our usage of Json4s is also a step towards being able to eventually
remove our dependency on Json4s: Spark's current use of Json4s creates library
conflicts for end users who want to adopt Json4s 4 (see discussion on PRs for
SPARK-36408). If Spark can eventually remove its Json4s dependency then we will
completely eliminate such conflicts.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]