I re-attached the image as follows. In case it still doesn't show up, the average point lookup throughput of* SSD for LSM + Logging* is only around *3-4k/s*. When a separate hard disk is used for logging, the average point lookup throughput reaches *30k-40k/s*.
[image: image.png] Best regards, Chen Luo On Thu, Feb 21, 2019 at 10:01 AM abdullah alamoudi <[email protected]> wrote: > Thanks for sharing Chen, very interesting. > > The image doesn't show up for me. Not sure if it shows up for others? > > Cheers, > Abdullah. > > On Wed, Feb 20, 2019 at 1:29 PM Chen Luo <[email protected]> wrote: > > > Hi Devs, > > > > Recently I've been running experiments with concurrent ingestions and > > queries on SSDs. I'd like to share an important lesson from my > experiments. > > In short,* it is very important (from the performance perspective) to use > > a separate disk for logging, even SSDs are good at random I/Os*. > > > > The following experiment illustrates this point. I was using YCSB with > > 100GB base data (100M records, each has 1KB). During each experiment, > there > > was a constant data arrival process of 3600 records/s. I executed > > concurrent point lookups (uniformly distributed) as much as possible > using > > 16 query threads (to saturate the disk). The page size was set to 4KB. > The > > experiments were performed on SSDs. The only difference is that one > > experiment had a separate hard disk for logging, while the other used the > > same SSD for both LSM and logging. The point lookup throughput over time > > was plotted below. The negative impact of logging is huge! > > > > [image: image.png] > > > > The reason is that logging needs to frequently force disk writes (in this > > experiment, the log flusher forces 70-80 times per second). Even though > the > > disk bandwidth used by the log flusher is small (4-5MB/s), the frequent > > disk forces could seriously impact the overall disk throughput. If you > have > > a workload with concurrent data ingestion and queries, please DO consider > > using a separate disk for logging to fully utilize the SSD bandwidth. > > > > Best regards, > > Chen Luo > > >
