Josh Rosen created SPARK-39489:
----------------------------------

             Summary: Improve EventLoggingListener and ReplayListener 
performance by replacing Json4S ASTs with Jackson trees
                 Key: SPARK-39489
                 URL: https://issues.apache.org/jira/browse/SPARK-39489
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core
    Affects Versions: 3.0.0
            Reporter: Josh Rosen
            Assignee: Josh Rosen


Spark's event log JsonProtocol currently uses Json4s ASTs to generate and parse 
JSON. Performance overheads from Json4s account for a significant proportion of 
all time spent in JsonProtocol. If we replace Json4s usage with direct usage of 
Jackson APIs then we can significantly improve performance (~2x improvement for 
writing and reading in my own local microbenchmarks).

This performance improvement translates to faster history server load times and 
reduced load on the Spark driver (and reduced likelihood of dropping events 
because the listener cannot keep up, therefore reducing the likelihood of 
inconsistent Spark UIs).

Reducing our usage of Json4s is also a step towards being able to eventually 
remove our dependency on Json4s: Spark's current use of Json4s creates library 
conflicts for end users who want to adopt Json4s 4 (see discussion on PRs for 
SPARK-36408). If Spark can eventually remove its Json4s dependency then we will 
completely eliminate such conflicts.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to