Hi Devs,

Recently I've been running experiments with concurrent ingestions and
queries on SSDs. I'd like to share an important lesson from my experiments.
In short,* it is very important (from the performance perspective) to use a
separate disk for logging, even SSDs are good at random I/Os*.

The following experiment illustrates this point. I was using YCSB with
100GB base data (100M records, each has 1KB). During each experiment, there
was a constant data arrival process of 3600 records/s. I executed
concurrent point lookups (uniformly distributed) as much as possible using
16 query threads (to saturate the disk). The page size was set to 4KB. The
experiments were performed on SSDs. The only difference is that one
experiment had a separate hard disk for logging, while the other used the
same SSD for both LSM and logging. The point lookup throughput over time
was plotted below. The negative impact of logging is huge!

[image: image.png]

The reason is that logging needs to frequently force disk writes (in this
experiment, the log flusher forces 70-80 times per second). Even though the
disk bandwidth used by the log flusher is small (4-5MB/s), the frequent
disk forces could seriously impact the overall disk throughput. If you have
a workload with concurrent data ingestion and queries, please DO consider
using a separate disk for logging to fully utilize the SSD bandwidth.

Best regards,
Chen Luo

Reply via email to