Baohe Zhang created SPARK-32350:
-----------------------------------
Summary: Add batch write support on LevelDB to improve performance
of HybridStore
Key: SPARK-32350
URL: https://issues.apache.org/jira/browse/SPARK-32350
Project: Spark
Issue Type: Improvement
Components: Web UI
Affects Versions: 3.0.1, 3.1.0
Reporter: Baohe Zhang
The idea is to improve the performance of HybridStore by adding batch write
support to LevelDB. https://issues.apache.org/jira/browse/SPARK-31608
introduces HybridStore. HybridStore will write data to InMemoryStore at first
and use a background thread to dump data to LevelDB once the writing to
InMemoryStore is completed. In the comments section of
[https://github.com/apache/spark/pull/28412], Mridul Muralidharan mentioned
using batch writing can improve the performance of this dumping process and he
wrote the code of writeAll().
I did the comparison of the HybridStore switching time between one-by-one write
and batch write on an HDD disk. When the disk is free, the batch-write has
around 25% improvement, and when the disk is 100% busy, the batch-write has 7x
- 10x improvement.
when the disk is at 0% utilization:
||log size, jobs and tasks per job||original switching time, with
write()||switching time with writeAll()||
|133m, 400 jobs, 100 tasks per job|16s|13s|
|265m, 400 jobs, 200 tasks per job|30s|23s|
|1.3g, 1000 jobs, 400 tasks per job|136s|108s|
when the disk is at 100% utilization:
||log size, jobs and tasks per job||original switching time, with
write()||switching time with writeAll()||
|133m, 400 jobs, 100 tasks per job|116s|17s|
|265m, 400 jobs, 200 tasks per job|251s|26s|
I also ran some write related benchmarking tests on LevelDBBenchmark.java and
measured the total time of writing 1024 objects. The test was conducted when
disk at 0% utilization.
||Benchmark test||with write(), ms||with writeAll(), ms||
|randomUpdatesIndexed|230.386|180.817|
|randomUpdatesNoIndex|58.935|50.113|
|randomWritesIndexed|315.241|254.400|
|randomWritesNoIndex|96.709|41.164|
|sequentialUpdatesIndexed|89.971|70.387|
|sequentialUpdatesNoIndex|72.021|53.769|
|sequentialWritesIndexed|103.052|67.358|
|sequentialWritesNoIndex|76.194|99.037|
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]