Thanks for sharing Chen, very interesting. The image doesn't show up for me. Not sure if it shows up for others?
Cheers, Abdullah. On Wed, Feb 20, 2019 at 1:29 PM Chen Luo <[email protected]> wrote: > Hi Devs, > > Recently I've been running experiments with concurrent ingestions and > queries on SSDs. I'd like to share an important lesson from my experiments. > In short,* it is very important (from the performance perspective) to use > a separate disk for logging, even SSDs are good at random I/Os*. > > The following experiment illustrates this point. I was using YCSB with > 100GB base data (100M records, each has 1KB). During each experiment, there > was a constant data arrival process of 3600 records/s. I executed > concurrent point lookups (uniformly distributed) as much as possible using > 16 query threads (to saturate the disk). The page size was set to 4KB. The > experiments were performed on SSDs. The only difference is that one > experiment had a separate hard disk for logging, while the other used the > same SSD for both LSM and logging. The point lookup throughput over time > was plotted below. The negative impact of logging is huge! > > [image: image.png] > > The reason is that logging needs to frequently force disk writes (in this > experiment, the log flusher forces 70-80 times per second). Even though the > disk bandwidth used by the log flusher is small (4-5MB/s), the frequent > disk forces could seriously impact the overall disk throughput. If you have > a workload with concurrent data ingestion and queries, please DO consider > using a separate disk for logging to fully utilize the SSD bandwidth. > > Best regards, > Chen Luo >
