Gengliang Wang created SPARK-30964:
--------------------------------------
Summary: Accelerate InMemoryStore with a new index
Key: SPARK-30964
URL: https://issues.apache.org/jira/browse/SPARK-30964
Project: Spark
Issue Type: Improvement
Components: Spark Core, Web UI
Affects Versions: 3.1.0
Reporter: Gengliang Wang
Assignee: Gengliang Wang
Spark uses the class `InMemoryStore` as the KV storage for live UI and history
server(by default if no LevelDB file path is provided).
In `InMemoryStore`, all the task data in one application is stored in a
hashmap, which key is the task ID and the value is the task data. This fine for
getting or deleting with a provided task ID.
However, Spark stage UI always shows all the task data in one stage and the
current implementation is to look up all the values in the hashmap. The time
complexity is O(numOfTasks).
Also, when there are too many stages (>spark.ui.retainedStages), Spark will
linearly try to look up all the task data of the stages to be deleted as well.
This can be very bad for a large application with many stages and tasks. We can
improve it by allowing the natural key of an entity to have a real parent
index. So that on each lookup with parent node provided, Spark can look up all
the natural keys(in our case, the task IDs) first, and then find the data with
the natural keys in the hashmap.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]