Hi Devs, Recently I've been running experiments with concurrent ingestions and queries on SSDs. I'd like to share an important lesson from my experiments. In short,* it is very important (from the performance perspective) to use a separate disk for logging, even SSDs are good at random I/Os*.
The following experiment illustrates this point. I was using YCSB with 100GB base data (100M records, each has 1KB). During each experiment, there was a constant data arrival process of 3600 records/s. I executed concurrent point lookups (uniformly distributed) as much as possible using 16 query threads (to saturate the disk). The page size was set to 4KB. The experiments were performed on SSDs. The only difference is that one experiment had a separate hard disk for logging, while the other used the same SSD for both LSM and logging. The point lookup throughput over time was plotted below. The negative impact of logging is huge! [image: image.png] The reason is that logging needs to frequently force disk writes (in this experiment, the log flusher forces 70-80 times per second). Even though the disk bandwidth used by the log flusher is small (4-5MB/s), the frequent disk forces could seriously impact the overall disk throughput. If you have a workload with concurrent data ingestion and queries, please DO consider using a separate disk for logging to fully utilize the SSD bandwidth. Best regards, Chen Luo
